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Abstract 

We prove new upper and lower bounds on the sample complexity of (e, 6) differentially pri¬ 
vate algorithms for releasing approximate answers to threshold functions. A threshold function 
Cx over a totally ordered domain X evaluates to Cx{y) = 1 if J/ < x, and evaluates to 0 oth¬ 
erwise. We give the first nontrivial lower bound for releasing thresholds with (e, 6) differential 
privacy, showing that the task is impossible over an infinite domain X, and moreover requires 
sample complexity n > f2(log* |V|), which grows with the size of the domain. Inspired by the 
techniques used to prove this lower bound, we give an algorithm for releasing thresholds with 
n < 1^1 samples. This improves the previous best upper bound of 1^1 

(Beimel et ah, RANDOM ’13). 

Our sample complexity upper and lower bounds also apply to the tasks of learning distri¬ 
butions with respect to Kolmogorov distance and of properly PAG learning thresholds with 
differential privacy. The lower bound gives the first separation between the sample complexity 
of properly learning a concept class with (e, 5) differential privacy and learning without privacy. 
For properly learning thresholds in (. dimensions, this lower bound extends to n > r2(£Tog* |V|). 

To obtain our results, we give reductions in both directions from releasing and properly 
learning thresholds and the simpler interior point problem. Given a database D of elements from 
A, the interior point problem asks for an element between the smallest and largest elements in 
D. We introduce new recursive constructions for bounding the sample complexity of the interior 
point problem, as well as further reductions and techniques for proving impossibility results for 
other basic problems in differential privacy. 
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1 Introduction 


The line of work on differential privacy [DMNS06] is aimed at enabling useful statistical analyses on 
privacy-sensitive data while providing strong privacy protections for individual-level information. 
Privacy is achieved in differentially private algorithms through randomization and the introduction 
of “noise” to obscure the effect of each individual, and thus differentially private algorithms can be 
less accurate than their non-private analogues. Nevertheless, by now a rich literature has shown 
that many data analysis tasks of interest are compatible with differential privacy, and generally 
the loss in accuracy vanishes as the number n of individuals tends to infinity. However, in many 
cases, there is still is a price of privacy hidden in these asymptotics — in the rate at which the loss 
in accuracy vanishes, and in how large n needs to be to start getting accurate results at all (the 
“sample complexity”). 

In this paper, we consider the price of privacy for three very basic types of computations 
involving threshold functions: query release, distribution learning with respect to Kolmogorov 
distance, and (proper) PAC learning. In all cases, we show for the first time that accomplishing these 
tasks with differential privacy is impossible when the data universe is infinite (e.g. N or [0,1]) and in 
fact that the sample complexity must grow with the size |-T| of the data universe: n = H(log* |X|), 
which is tantalizingly close to the previous upper bound of n = [BNSlSb]. We also 

provide simpler and somewhat improved upper bounds for these problems, reductions between 
these problems and other natural problems, as well as additional techniques that allow us to prove 
impossibility results for infinite domains even when the sample complexity does not need to grow 
with the domain size (e.g. for PAC learning of point functions with “pure” differential privacy). 

1.1 Differential Privacy 

We recall the definition of differential privacy. We think of a dataset as consisting of n rows from a 
data universe X, where each row corresponds to one individual. Differential privacy requires that 
no individual’s data has a significant effect on the distribution of what we output. 

Definition 1.1. A randomized algorithm M : A” —)• K is (e,(5) differentially private if for every 
two datasets x, x' G that differ on one row, and every set T CY, we have 

Pr[M(x) G T] < • Pr[M(x') £ T] + 6. 

The original definition from [DMNS06] had (5 = 0, and is sometimes referred to as pure dif¬ 
ferential privacy. However, a number of subsequent works have shown that allowing a small (but 
negligible) value of S, referred to as approximate differential privacy, can provide substantial gains 
over pure differential privacy [DL09, HTIO, DRVIO, Del2, BNSlSb]. 

The common setting of parameters is to take e to be a small constant and 6 to be negligible 
in n (or a given security parameter). To simplify the exposition, we fix e = 0.1 and 6 = 
throughout the introduction (but precise dependencies on these parameters are given in the main 
body). 

1.2 Private Query Release 

Given a set Q of queries q : X’^ —>■ M, the query release problem for Q is to output accurate answers 
to all queries in Q. That is, we want a differentially private algorithm M : X'^ —)> such that 
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for every dataset D G X", with high probability over y ^ M{D), we have \yq — q{D)\ < a for all 
q € Q, for an error parameter a. 

A special case of interest is the case where Q consists of counting queries. In this case, we are 
given a set Q of predicates q : X ^ {0,1} on individual rows, and then extend them to databases 
by averaging. That is, q{D) = (1/n) Qi^i) counts the fraction of the population that satisfies 
predicate q. 

The query release problem for counting queries is one of the most widely studied problems in 
differential privacy. Early work on differential privacy implies that for every family of counting 
queries Q, the query release problem for Q has sample complexity at most 0{^/\Q\) [DN03, DN04, 
BDMN05, DMNS06]. That is, there is an no = 0{^/\Q\) such that for all n > no, there is a 
differentially private mechanism M : X” —>• that solves the query release problem for Q with 

error at most a = 0.01. (Again, we treat a as a small constant to avoid an extra parameter in the 
introduction.) 

Remarkably, Blum, Ligett, and Roth [BLR08] showed that if the data universe X is finite, then 
the sample complexity grows much more slowly with \Q\ — indeed the query release problem for 
Q has sample complexity at most 0((log |X|)(log |Q|)). Hardt and Rothblum [HRIO] improved 
this bound to 0(log|Q| • y^log |X|), which was recently shown to be optimal for some families 
Q [BUV14]. 

However, for specific query families of interest, the sample complexity can be significantly 
smaller. In particular, consider the family of point functions over X, which is the family {qx}x£X 
where qxiv) is 1 iff y = x, and the family of threshold functions over X, where qxiu) is 1 iff y < x 
(where X is a totally ordered set). The query release problems for these families correspond to 
the very natural tasks of producing i^o approximations to the histogram and to the cumulative 
distribution function of the empirical data distribution, respectively. For point functions, Beimel, 
Nissim, and Stemmer [BNS13b] showed that the sample complexity has no dependence on |X| (or 
I<51, since \Q\ = \X\ for these families). In the case of threshold functions, they showed that it has 
at most a very mild dependence on |X| = \Q\, namely 

Thus, the following basic questions remained open: Are there differentially private algorithms 
for releasing threshold functions over an infinite data universe (such as N or [0,1])? If not, does the 
sample complexity for releasing threshold functions grow with the size |X| of the data universe? 

We resolve these questions: 

Theorem 1.2. The sample complexity of releasing threshold functions over a data universe X with 
differential privacy is at least n(log* |X|). In particular, there is no differentially private algorithm 
for releasing threshold functions over an infinite data universe. 

In addition, inspired by the ideas in our lower bound, we present a simplihcation of the algorithm 
of [BNS13b] and improve the sample complexity to (from roughly 8*°®* 1^1). Closing 

the gap between the lower bound of ~ log* |X| and the upper bound of ~ 1^1 remains an 

intriguing open problem. 

We remark that in the case of pure differential privacy (5 = 0), a sample complexity lower 
bound of n = II(log|X|) follows from a standard “packing argument” [HTIO, BKNIO]. For point 
functions, this is matched by the standard Laplace mechanism [DMNS06]. For threshold functions, 
a matching upper bound was recently obtained [RR14], building on a construction of [DNPRIO]. 
We note that these algorithms have a slightly better dependence on the accuracy parameter a than 
our algorithm (linear rather than nearly linear in 1/a). In general, while packing arguments often 
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yield tight lower bounds for pure differential privacy, they fail badly for approximate differential 
privacy, for which much less is known. 

There is also a beautiful line of work on characterizing the ^ 2 -a'Ccuracy achievable for query 
release in terms of other measures of the “complexity” of the family Q (such as “hereditary dis¬ 
crepancy”) [HTIO, BDKT12, MN12, NTZ13]. However, the characterizations given in these works 
are tight only up to factors of poly(log |X|, log |Q|) and thus do not give good estimates of the sam¬ 
ple complexity (which is at most (log |X|)(log |Q|) even for pure differential privacy, as mentioned 
above). 

1.3 Private Distribution Learning 

A fundamental problem in statistics is distribution learning, which is the task of learning an un¬ 
known distribution D given i.i.d. samples from it. The query release problem for threshold functions 
is closely related to the problem of learning an arbitrary distribution 2? on M up to small error in 
Kolmogorov (or CDF) distance: Given n i.i.d. samples Xi ■(—r D, the goal of a distribution learner 
is to produce a CDF T : X —>■ [0,1] such that \F{x) — Fx>{x)\ < a for all x £ X, where a is 
an accuracy parameter. While closeness in Kolmogorov distance is a relatively weak measure of 
closeness for distributions, under various structural assumptions (e.g. the two distributions have 
probability mass functions that cross in a constant number of locations), it implies closeness in the 
much stronger notion of total variation distance. Other works have developed additional techniques 
that use weak hypotheses learned under Kolmogorov distance to test and learn distributions under 
total variation distance (e.g. [DDS'^13, DDS14, DK14]). 

The well-known Dvoretzky-Kiefer-Wolfowitz inequality [DKW56] implies that without privacy, 
any distribution over X can be learned to within constant error with 0(1) samples. On the 
other hand, we show that with approximate differential privacy, the task of releasing thresholds 
is essentially equivalent to distribution learning. As a consequence, with approximate differential 
privacy, distribution learning instead requires sample complexity that grows with the size of the 
domain. 

Theorem 1.3. The sample complexity of learning arbitrary distributions on a domain X with 
differential privacy is at least D(log* |W|). 

We prove Theorem 1.3 by showing that the problem of distribution learning with respect to 
Kolmogorov distance with differential privacy is essentially equivalent to query release for thresh¬ 
old functions. Indeed, query release of threshold functions amounts to approximating the empirical 
distribution of a dataset with respect to Kolmogorov distance. Approximating the empirical dis¬ 
tribution is of course trivial without privacy (since we are given it as input), but with privacy, it 
turns out to have essentially the same sample complexity as the usual distribution learning problem 
from i.i.d. samples. More generally, query release for a family Q of counting queries is equivalent to 
distribution learning with respect to the distance measure dQ{'D,'D') = sup^gQ | E[g(D)] —E[( 7 (D')]|. 

1.4 Private PAC Learning 

Kasiviswanathan et al. [KLN“*'ll] defined private PAC learning as a combination of probably ap¬ 
proximately correct (PAC) learning [Val84] and differential privacy. Recall that a PAC learning 
algorithm takes some n labeled examples {xi,c{xi)) £ X x {0,1} where the xfs are i.i.d. samples 
of an arbitrary and unknown distribution on a data universe X and c : A —)• {0,1} is an unknown 
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concept from some concept class C. The goal of the learning algorithm is to output a hypothesis 
/i : X —)• {0,1} that approximates c well on the unknown distribution. We are interested in PAC 
learning algorithms L : {X x {0, !})”■ —)• H that are also differentially private. Here H is the 
hypothesis class; H C C, then L is called a proper learner. 

As with query release and distribution learning, a natural problem is to characterize the sample 
complexity — the minimum number n of samples in order to achieve differentially private PAC 
learning for a given concept class C. Without privacy, it is well-known that the sample com¬ 
plexity of (proper) PAC learning is proportional to the Vapnik-Chervonenkis (VC) dimension of 
the class C [VC71, BEHW89, EHKV89]. In the initial work on differentially private learning, 
Kasiviswanathan et al. [KLN+11] showed that 0(log|C'|) labeled examples suffice for privately 
learning any concept class C} The VC dimension of a concept class C is always at most log |C|, 
but is significantly lower for many interesting classes. Hence, the results of [KLN+11] left open the 
possibility that the sample complexity of private learning may be significantly higher than that of 
non-private learning. 

In the case of pure differential privacy (<5 = 0), this gap in the sample complexity was shown to 
be unavoidable in general. Beimel, Kasiviswanathan, and Nissim [BKNIO] considered the concept 
class C of point functions over a data universe X, which have VC dimension 1 and hence can 
be (properly) learned without privacy with 0(1) samples. In contrast, they showed that proper 
PAC learning with pure differential privacy requires sample complexity n(log|X|) = 0(log|C'|). 
Eeldman and Xiao [FX14] showed a similar separation even for improper learning — the class C 
of threshold functions over X also has VC dimension 1, but PAC learning with pure differential 
privacy requires sample complexity II(log |X|) = 11 (log |0|). 

For approximate differential privacy (<5 > 0), however, it was still open whether there is an 
asymptotic gap between the sample complexity of private learning and non-private learning. Indeed, 
Beimel et al. [BNSlSb] showed that point functions can be properly learned with approximate 
differential privacy using 0(1) samples (i.e. with no dependence on |X|). For threshold functions, 
they exhibited a proper learner with sample complexity but it was conceivable that the 

sample complexity could also be reduced to 0(1). 

We prove that the sample complexity of proper PAC learning with approximate differential 
privacy can be asymptotically larger than the VC dimension: 

Theorem 1.4. The sample complexity of properly learning threshold functions over a data universe 
X with differential privacy is at least II(log* |X|). 

This lower bound extends to the concept class of Odimensional thresholds. An Odimensional 
threshold function, defined over the domain X^, is a conjunction of i threshold functions, each 
defined on one component of the domain. This shows that our separation between the sample 
complexity of private and non-private learning applies to concept classes of every VC dimension. 

Theorem 1.5. For every finite, totally ordered X and £ G N, the sample complexity of properly 
learning the class C of (.-dimensional threshold functions on X^ with differential privacy is at least 

n{( • log* |x|) = n(vc{c) ■ log* |x|). 

Based on these results, it would be interesting to fully characterize the difference between 
the sample complexity of proper non-private learners and of proper learners with (approximate) 
differential privacy. Furthermore, our results still leave open the possibility that improper PAC 

^As with the query release discussion, we omit the dependency on all parameters except for |C|, |A| and VCIC). 
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learning with (approximate) differential privacy has sample complexity 0(VC(C')). We consider 
this to be an important question for future work. 

We also present a new result on improper learning of point functions with pure differential 
privacy over infinite countable domains. Beimel et al. [BKNIO, BNSlSa] showed that for finite data 
universes X, the sample complexity of improperly learning point functions with pure differential 
privacy does not grow with \X\. They also gave a mechanism for learning point functions over 
infinite domains (e.g. X = N), but the outputs of their mechanism do not have a finite description 
length (and hence cannot be implemented by an algorithm). We prove that this is inherent: 

Theorem 1.6. For every infinite domain X, countable hypothesis space H, and n G N, there is no 
(even improper) PAC learner L : (X x {0,1})” —)• H for point functions over X that satisfies pure 
differential privacy. 

1.5 Techniques 

Our results for query release and proper learning of threshold functions are obtained by analyzing 
the sample complexity of a related but simpler problem, which we call the interior-point problem. 
Here we want a mechanism M : X^ —)• X (for a totally ordered data universe X) such that for every 
database D G X'^, with high probability we have min* Di < M{D) < maxj Di. We give reductions 
showing that the sample complexity of this problem is equivalent to the other ones we study: 

Theorem 1.7. Over every totally ordered data universe X, the following four problems have the 
same sample complexity (up to constant factors) under differential privacy.■ 

1. The interior-point problem. 

2. Query release for threshold functions. 

3. Distribution learning (with respect to Kolmogorov distance). 

4 . Proper PAC learning of threshold functions. 

Thus we obtain our lower bounds and our simplified and improved upper bounds for query 
release and proper learning by proving such bounds for the interior-point problem, such as: 

Theorem 1.8. The sample complexity for solving the interior-point problem over a data universe 
X with differential privacy is H(log* |X|). 

Note that for every fixed distribution D over X there exists a simple differentially private 
algorithm for solving the interior-point problem (w.h.p.) over databases sampled i.i.d. from D - 
simply output a point z s.t. Prxr^v[x > z] = 1/2. Hence, in order to prove Theorem 1.8, we 
show a (correlated) distribution D over databases of size n ~ log* |X| on which privately solving 
the interior-point problem is impossible. The construction is recursive: we use a hard distribution 
over databases of size (n — 1) over a data universe of size logarithmic in |X| to construct the hard 
distribution over databases of size n over X. 

By another reduction to the interior-point problem, we show an impossibility result for the 
following undominated-point problem: 

Theorem 1.9. For every n G N, there does not exist a differentially private mechanism M : N” —)> 
N with the property that for every dataset D G N”', with high probability M{D) > min* D^. 
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Note that for the above problem, one cannot hope to construct a single distribution over 
databases that every private mechanism fails on. The reason is that for any such distribution P, 
and any desired failure probability (3, there is some number K for which Y’lD^Dlxn.&yiD > K]<I3, 
and hence that the mechanism that always outputs K solves the problem. Hence, given a mecha¬ 
nism M we must tailor a hard distribution D_v( • We use a similar mechanism-dependent approach 
to prove Theorem 1.6. 

2 Preliminaries 

Throughout this work, we use the convention that [n] = {0,l,...,n — 1} and write log for log 2 . We 
use THRESHx to denote the set of all threshold functions over a totally ordered domain X. That is, 

THRESHx = {ca; : x G X} where Cx{y) = 1 iS y < x. 


2.1 Differential Privacy 

Our algorithms and reductions rely on a number of basic results about differential privacy. Early 
work on differential privacy showed how to solve the query release problem by adding independent 
Laplace noise to each exact query answer. A real-valued random variable is distributed as Lap(6) 
if its probability density function is f{x) = ^exp(—^). We say a function / : A” —)■ has 

sensitivity A if for all neighboring D,D' G A", it holds that \\f{D) — f{D')\\i < A. 

Theorem 2.1 (The Laplace Mechanism [DMNS06]). Let f : A”" be a sensitivity A function. 

The mechanism A that on input D G A” adds independent noise with distribution Lap(A/e) to 
each coordinate of f{D) preserves e-differential privacy. 

We will present algorithms that access their input database using (several) differentially private 
mechanisms and use the following composition theorem to prove their overall privacy guarantee. 

Lemma 2.2 (Composition, e.g. [DL09]). Let Afi : A”" —)• TZi be {ei, 6i)-differentially private. Let 
M 2 ■ Al"’ X 7^1 —)• 7 Z 2 be {£ 2 , 52 )-differentially private for any fixed value of its second argument. 
Then the composition M{D) = M 2 {D, Mi{D)) is (ei -|- £ 2 , -|- 62 )-differentially private. 

3 The Interior Point Problem 

3.1 Definition 

In this work we exhibit a close connection between the problems of privately learning and releasing 
threshold queries, distribution learning, and solving the interior point problem as defined below. 

Definition 3.1. An algorithm A : A” —)■ A solves the interior point problem on A with error 
probability /3 if for every D G X^, 

Pr[minil < A{D) < maxD] >1-/3, 

where the probability is taken over the coins of A. The sample complexity of the algorithm A is 
the database size n. 

We call a solution x with mini? < x < maxD an interior point of D. Note that x need not be 
a member of the database D. 
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3.2 Lower Bound 

We now prove our lower bound on the sample complexity of private algorithms for solving the 
interior point problem. 

Theorem 3.2. Fix any constants < e < 1/4. Let5{n) < 1/(50ti^). Then for every positive integer 
n, solving the interior point problem on X with probability at least 3/4 and with (e, 5(n))-differential 
privacy requires sample complexity n > fl(log* |W|). 

Our choice of <5 = 0(l/n^) is unimportant; any monotonically non-increasing convergent series 
will do. To prove the theorem, we inductively construct a sequence of database distributions {T’n} 
supported on data universes [/^(n)] (for S{n-\- 1) = over which any differentially private 

mechanism using n samples must fail to solve the interior point problem. Given a hard distribution 
Vn over n elements (xi, X 2 ,..., Xn) from [S'(n)], we construct a hard distribution Vn+i over elements 
{yo,yi,..., yn) from [^(n-l-1)] by setting yo to be a random number, and letting each other yi agree 
with yo on the xi most significant digits. We then show that if y is the output of any differentially 
private interior point mechanism on (yo,..., y^), then with high probability, y agrees with yo on at 
least minxi entries and at most maxx* entries. Thus, a private mechanism for solving the interior 
point problem on T>n+i can be used to construct a private mechanism for and so Vn+i must 
also be a hard distribution. 

The inductive lemma we prove depends on a number of parameters we now define. Fix i > 
6, /3 > 0. Let 5{ri) be any positive non-increasing sequence for which 

£ n 

C I J. -1 

J=1 

for every n. In particular, it suffices that 


n=l 


t± 

e^ + l' 


Let 5(n) = l/d{n) and define the function S recursively by 


S'(l) = 2 and ^(n-|-1) = 


Lemma 3.3. For every positive integer n, there exists a distribution T>n over databases D G 
[5(n)]”' = {0,1,..., S{n) — 1}'^ such that for every (e, 5(n))-differentially private mechanism M., 

Pr[minL) < M.{D) < maxL>] < P^, 
where the probability is taken over D •(—and the coins of M.. 

In this section, we give a direct proof of the lemma and in Appendix B, we show how the 
lemma follows from the construction of a new combinatorial object we call an “interior point 
fingerprinting code.” This is a variant on traditional fingerprinting codes, which have been used 
recently to show nearly optimal lower bounds for other problems in approximate differential privacy 
[BUV14, DTTZ14, BST14]. 
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Proof. The proof is by induction on n. We first argue that the claim holds for n = 1 by letting T>i 
be uniform over the singleton databases (0) and (1). To that end let x Pi and note that for 
any (e, (5(l))-differentially private mechanism M.q : {0,1} —)• {0,1} it holds that 


Pr[7Wo(a^) = x] < Pr[7Wo(a^) = x\ + (5(1) = e^(l — Pr[Alo(2^) = a^]) + <^(1); 


giving the desired bound on Pr[7Wo(a^) = x\. 

Now inductively suppose we have a distribution Vn that satisfies the claim. We construct a 
distribution P^+i on databases (yoil/i) • • • iVn) £ [5'(n + 1)]"'+^ that is sampled as follows: 


• Sample (xi,..., Xn) ^r Vn 


Sample a uniformly random yo [S{n + 1)]. We write the base 6(n) representation of yo as 


( 1 ) ( 2 ) 
Vo Vo 


■Vo 


• For each z = 1,..., n let be a base b{n) number (written yf^^yj-^^ . • that agrees 

with the base b{n) representation of yo on the first Xj digits and contains a random sample 
from [6(n)] in every index thereafter. 


Suppose for the sake of contradiction that there were an (e, (5(n+l))-differentially private mechanism 
M that could solve the interior point problem on Vn+i with probability greater than Pn+i- We 
use M to construct the following private mechanism Ai for solving the interior point problem on 
Vn, giving the desired contradiction; 


Algorithm 1 Ai{D) 

Input: Database D = (xi,... ,Xn) G ['S'(u)]’^ 

1. Construct D = (yo,..., yn) by sampling from Vn+i, but starting with the database D. That 
is, sample yo uniformly at random and set every other y^ to be a random base b{n) string 
that agrees with yo on the first x* digits. 

2. Compute y-^R Al(il). 

3. Return the length of the longest prefix of y (in base b{n) notation) that agrees with yo. 


The mechanism Ai is also {£,5{n + l))-differentially private, since for all pairs of adjacent 
databases D ^ D' and every T C [S'(n)], 

Px[Ai{D) G T] = E Pr[.M(D) G f | yo] 

yo<-R[<S'(n+l)] 

< E (e^ PT[Ai[D') G T I yo] + (5) since t) D' for fixed yo 

2/0<-R[<S'(n+l)] 

= e^Pr[M(D') G T] + <5, 

where T is the set of y that agree with yo in exactly the first x entries for some x G T. 

Now we argue that Ai solves the interior point problem on Vn with probability greater than Pn- 
First we show that x > minD with probability greater than Pn+i- Observe that by construction, 




all the elements of D agree in at least the first mini) digits, and hence so does any interior point 
of D. Therefore, if M' sncceeds in ontputting an interior point y of D, then y must in particular 
agree with y^ in at least minD digits, so x > minL>. 

Now we use the privacy that M provides to yo to show that x < maxD except with probability 
at most lb{n) + 6{n + 1). Fix a database D. Let w = maxL), and fix all the randomness of Ai 
but the {w + l)st entry of yo (note that since w = maxD, this fixes yi,..., yn)- Since the {w + l)st 
entry of yo is still a uniformly random element of [6(ra)], the privately produced entry should 
not be able to do much better than randomly guessing ■ Formally, for each z G [b{n)], let 

bz denote the database D with set to z and everything else fixed as above. Then by the 

differential privacy of A4, 


Pr [Mib, 
z&\h{nt 


d = 


< 


L ^ Fv[M{b. 


26[6(n)] 


b{n) 


= z] 


Y’i:[M.{bziY’^^ = z] + 6{n + 1) 


< 


6(n) 


+ 5{n + 1), 


where all probabilities are also taken over the coins of Ai. Thus x < maxD except with probability 
at most jb{n) + 6{n + 1). By a union bound, minL) < x < maxD with probability greater than 

This gives the desired contradiction. □ 

We now prove Theorem 3.2 by estimating the S{n) guaranteed by Lemma 3.3. 

Proof of Theorem 3.2. Let S{n) be as in Lemma 3.3. We introduce the following notation for 
iterated exponentials: 

tower(x) = X and tower*^^^(x) = 

Observe that for /c > 1, x > 0, and M > 16, 

j^toweA'^^ (x) _ (x) log M 

= tower^^^(tower*'^“^^(x) + log log M) 

< tower(^)(tower*^^“^)(x + log log M)) 

= tower*-*^’*'^^ (x + log log M). 


By induction on n we get an upper bound of 

S{n + 1) < tower^"^(2 + nloglog(cn^)) < tower^"'’''^°® 
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This immediately shows that solving the interior point problem on X = [<S'(n)] requires sample 
complexity 


n > log* S(n) — log* (cn^) 

> log* S(n) - C>(log* log* S(n)) 

= log* |X| - 0(log* log*|X|). 

To get a lower bound for solving the interior point problem on X when |X| is not of the form S{n), 
note that a mechanism for X is also a mechanism for every X' s.t. \X'\ < |X|. The lower bound 
follows by setting |X'| = S{n) for the largest n such that S{n) < |-T|. □ 


3.3 Upper Bound 

We now present a recursive algorithm, RecPrefix, for privately solving the interior point problem. 


Theorem 3.4. Let (3,e,6 > 0, let X be a finite, totally ordered domain, and let n G N with 
^ ^ 18500 . 2 iog* 1^1 . log*(|X|) • ln( ^ )■ If ReePrefix (defined below) is executed on a database 

SgX^ with parameters 2 iog*|xp 2iogVb 

1. ReePrefix is {e,5)-dijferentially private; 


2. With probability at least (1 — fi), the output x satisfies min{a:j : x* G S'} < x < max{xj : Xj G 
5}. 


The idea of the algorithm is that on each level of recursion, ReePrefix takes an input database 
S over X and constructs a database S' over a smaller universe X' , where |X'| = log|X|, in which 
every element is the length of the longest prefix of a pair of elements in S (represented in binary). 
In a sense, this reverses the construction presented in Section 3.2. 


3.3.1 The exponential and choosing mechanisms 

Before formally presenting the algorithm ReePrefix, we introduce several additional algorithmic 
tools. One primitive we will use is the exponential mechanism of MeSherry and Talwar [MT07]. 
Let X* denote the set of all finite databases over the universe X. A quality function q : X* x U —>■ N 
defines an optimization problem over the domain X and a finite solution set J-: Given a database 
S G X*, choose f G T that (approximately) maximizes q{^S, /). The exponential mechanism solves 
such an optimization problem by choosing a random solution where the probability of outputting 
any solution / increases exponentially with its quality q{P), /). Specifically, it outputs each f G T 
with probability oc exp (e • g(S,/)/2Ag). Here, the sensitivity of a quality function, Ag, is the 
maximum over all / G of the sensitivity of the function g(-, /). 

Proposition 3.5 (Properties of the Exponential Mechanism). 

1. The exponential mechanism is e-differentially private. 

2. Let q be a quality function with sensitivity at most 1. Fix a database S G A” and let OPT = 
maxf^jr{q{S, /)}. Let t > 0. Then exponential mechanism outputs a solution f with q{S, /) < 
OPT —tn with probability at most lUj • exp(—etn/2). 


10 







We will also use an (e, (5)-differentially private variant of the exponential mechanism called the 
choosing mechanism, introduced in [BNSlSb], 

A quality function with sensitivity at most 1 is of k-bounded-growth if adding an element to a 
database can increase (by 1) the score of at most k solutions, without changing the scores of other 
solutions. Specifically, it holds that 

1- 9(0) /) = 0 for a-ll / £ 

2. If S 2 = SiU {x}, then q{Si, /) + 1 > q{S 2 , f) > q{Si, f) for all f £ T, and 

3. There are at most k values of / for which q{S 2 , f) = 9(*S'i, /) + 1- 

The choosing mechanism is a differentially private algorithm for approximately solving bounded- 
growth choice problems. Step 1 of the algorithm checks whether a good solution exists (otherwise 
any solution is approximately optimal) and Step 2 invokes the exponential mechanism, but with 
the small set G{S) of good solutions instead of T. 


Algorithm 2 Choosing Mechanism 

Input: database S, quality function q, solntion set T, and parameters /3,e,6 and k. 

1. Set OPT = OPT + Lap(|). If OPT < | In(^) then halt and return T. 

2. Let G{S) = {f £ T : q{S,f) > 1}. Choose and return / £ G{S) using the exponential 
mechanism with parameter 


The following lemmas give the privacy and utility guarantees of the choosing mechanism. We 
give a slightly improved utility result over [BNSlSb], and the analysis is presented in Appendix A. 

Lemma 3.6. Fix 6 > 0, and 0 < e < 2. If q is a k-bounded-growth quality function, then the 
choosing mechanism is {e, 6)-differentially private. 

Lemma 3.7. Let the choosing mechanism be executed on a k-bounded-growth quality function, and 
on a database S s.t. there exists a solution f with quality q{S,f) > ^ln(^). With probability at 
least (1 — (3), the choosing mechanism outputs a solution f with quality q{S,f) > 1. 

Lemma 3.8. Let the choosing mechanism be executed on a k-bounded-growth quality function, and 
on a database S containing m elements. With probability at least (1 — /3), the choosing mechanism 
outputs a solution f with quality q{S,f) > OPT—^ln(^l). 

3.3.2 The RecPrefix algorithm 

We are now ready to present and analyze the algorithm RecPrefix. 
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Algorithm 3 RecPrefix 

Input: Database S = G X"', parameters ,0, e, <5. 

1. If |X| < 32, then use the exponential mechanism with privacy parameter e and quality 
function q{S, x) = min {#{j : Xj > x}, #{j : Xj < x}} to choose and return a point x € X. 

2. Let k = [^ln(^)J, and let Y = {yi,y2, ■ ■ ■ ,yn- 2 k) be a random permutation of the 
smallest {n—2k) elements in S. 

3. For j = 1 to , define Zj as the length of the longest prefix for which y 2 j-i agrees with 
y 2 j (in base 2 notation). 

4. Execute RecPrefix recursively on S' = G with parameters fi,e,6. 

Recall that |A'| = log |A|. Denote the returned value by z. 

5. Use the choosing mechanism to choose a prefix L of length (z + 1) with a large number 
of agreements among elements in S. Use parameters /?, e, 6, and the quality function q : 
X* X X^~^^ —)> N, where q{S, I) is the number of agreements on I among xi,..., Xn- 

6. For a G {0,1}, define Lq- G X to be the prefix L followed by (log |X| — z — 1) appearances 
of a. 

7. Compute big = Lap(i) + : Xj > Li}. 

8. If big > ^ then return Li. Otherwise return Lq. 


We start the analysis of RecPrefix with the following two simple observations. 

Observation 3.9. There are at most log* |X| recursive calls throughout the execution of RecPrefix 
on a database S G X*. 

Observation 3.10. Let RecPrefix be executed on a database S G X”, where n > 2^°® I'’'-! • • 

ln(^)- Every recursive call throughout the execution operates on a database containing at least 
■ In(^) elements. 

Proof. This follows from Observation 3.9 and from the fact that the recursive call is executed 
on a database of size n* = XLr — k ~ 

We now analyze the utility guarantees of RecPrefix by proving the following lemma. 

Lemma 3.11. Let fi, e, 6, and S G X” be inputs on which RecPrefix performs at most N recursive 
calls, all of which are on databases of at least • In(^) elements. With probability at least 
(1 — 3/5X), the output x is s.t. 

1. 3xi G 5 s.t. Xi < x; 

2. |{i : Xi > x}| > /c = [^ • ln(^)J. 

Before proving the lemma, we make a combinatorial observation that motivates the random 
shuffling in Step 2 of RecPrefix. A pair of elements y,y' G S is useful in Algorithm RecPrefix if 
many of the values in S lie between y and y' - a prefix on which y, y' agree is also a prefix of every 
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element between y and y'. A prefix common to a useful pair can hence be identified privately via 
stability-based techniques. Towards creating useful pairs, the set S is shuffled randomly. We will 
use the following lemma: 


Claim 3.12. Tet (IIi, 112,..., n„) be a random permutation of {1,2,... ,n). Then for all r > 1, 


Pr 




n2i-i — If' 


2i\ < 


12 , 


> r 


< 2 “ 


Proof. We need to show that w.h.p. there are at most r “bad” pairs (n 2 i_i,n 2 j) within distance 
j^. For each i, we call n 2 j_i the left side of the pair, and n 2 i the right side of the pair. Let us hrst 
choose r elements to be placed on the left side of r bad pairs (there are (”) such choices). Once 
those are fixed, there are at most (g)^ choices for placing elements on the right side of those pairs. 
Now we have r pairs and n — 2r unpaired elements that can be shuffled in (n — r)! ways. Overall, 
the probability of having at least r bad pairs is at most 


0(ir("->■)!_ (gr ^ (i) 


ni 


ri 


/rr' e 




< 2 " 


where we have used Stirling’s approximation for the first inequality. 


□ 


Suppose we have paired random elements in our input database S, and constructed a database 
S' containing lengths of the prefixes for those pairs. Moreover, assume that by recursion we have 
identihed a length z which is the length at least r random pairs. Although those prehxes may be 
different for each pair. Claim 3.12 guarantees that (w.h.p.) at least one of these prefixes is the 
prehx of at least ^ input elements. This will help us in (privately) identifying such a prehx. 

Proof of Lemma 3.11. The proof is by induction on the number of recursive calls, denoted as t. 
For t = 1 (i.e., |A| < 32), the claim holds as long as the exponential mechanism outputs an 
X with q{S, x) > k except with probability at most (3. By Proposition 3.5, it suffices to have 
n > ■ In(^), since 32exp(—e(n/2 — k)/2) < 13. 

Assume that the stated lemma holds whenever RecPrefix performs at most t — 1 recursive calls. 
Let (3, e, 6 and S = {xi)f^i G be inputs on which algorithm RecPrefix performs t recursive calls, 
all of which are on databases containing at least ■ In(^) elements. Consider the hrst call in 
the execution on those inputs, and let yi,..., yn- 2 k be the random permutation chosen on Step 2. 
We say that a pair y 2 j-i,y 2 j is close if 


y2j-i <yi< 2/2j 
i : or 

2 / 2 j <yi< y 2 j-i 

By Claim 3.12, except with probability at most < (3, there are at most {k — 1) close pairs. 

We continue the proof assuming that this is the case. 

Let S' = be the database constructed in Step 3. By the inductive assumption, with 

probability at least (1 — 3/3(t — 1)), the value z obtained in Step 4 is s.t. (1) 3zi G S' s.t. Zi < z; 
and (2) |{zj G S" : z* > z}| > k. We proceed with the analysis assuming that this event happened. 

By (2), there are at least k pairs 2 / 2 j-i, 2 / 2 j that agree on a prehx of length at least z. At least 
one of those pairs, say y 2 j*-i,y 2 j*, is not close. Note that every y between y 2 j*-i and y 2 j* agrees 


< 


k-l 

~T^' 
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on the same prefix of length z, and that there are at least such elements in S. Moreover, as 
the next bit is either 0 or 1, at least half of those elements agree on a prefix of length (z + 1). Thus, 
when using the choosing mechanism on Step 5 (to choose a prefix of length (z + 1)), there exists at 
least one prefix with quality at least ^ • In(^). By Lemma 3.8, the choosing mechanism 

ensures, therefore, that with probability at least (1 —/3), the chosen prehx L is the prehx of at least 
one y*/ G S, and, hence, this yi' satishes Lq < yit < Li (defined in Step 6). We proceed with the 
analysis assuming that this is the case. 

Let Zj G S' be s.t. Zj < z. By the definition of Zj, this means that y 2 j-i and ^ 23 - agree on a 
prefix of length at most z. Hence, as L is of length z + 1, we have that either min{ 7 / 2 j_i; 2/23} ^ ^0 
or max{ 1 / 22 / 23 } ^ rnin{y 2 j_i) 2 / 23 } < -^ 0 ; then Lq satisfies Condition 1 of being a good 

output. It also satisfies Condition 2 because y,/ > Lq and y*/ G Y, which we took to be the smallest 
n — 2 k elements of S. Similarly, Li is a good output if max{y 2 j_j^, y 2 j} > Li. In any case, at least 
one out of Lq, Li is a good output. 

If both Lq and Li are good outputs, then Step 8 cannot fail. We have already established 
the existence of Lq < yii < Li. Hence, if Li is not a good output, then there are at most (fe—1) 
elements Xi £ S s.t. Xi > Li. Hence, the probability of big > 3k/2 and Step 8 failing is at most 
exp(—< j 3 . It remains to analyze the case where Lq is not a good output (and Li is). 

If Lq is not a good output, then every Xj G S satishes Xj > Lq. In particular, min{y 2 j_^, y 2 j} > 
Lq, and, hence, max{y^~j_.^,y:^-j} > Li. Recall that there are at least 2 k elements in S which are 

bigger than max{y 2 j_^, y 2 j}. As k > ^ln(^), the probability that big < 3k/2 and RecPrefix iails 
to return Li in this case is at most /3. 

All in all, RecPrefix fails to return an appropriate x with probability at most 3/3L □ 

We now proceed with the privacy analysis. 

Lemma 3.13. When executed for N recursive calls, RecPrefix is {2eN,26N)-differentially private. 

Proof. The proof is by induction on the number of recursive calls, denoted by t. For t = 1 (i.e., 
|A| < 32), then by Proposition 3.5 the exponential mechanism ensures that RecPrefix is (e, 0)- 
differentially private. Assume that the stated lemma holds whenever RecPrefix performs at most 
t — 1 recursive calls, and let ^i, S '2 G X* be two neighboring databases on which RecPre/ 12 ; performs 
t recursive calls. ^ Let B denote an algorithm consisting of steps 1-4 of RecPrefix (the output of B 
is the value z from Step 4). Consider the executions of B on Si and on S2, and denote by li, 
and by Y 2 , S '2 the elements Y, S' as they are in the executions on 5i and on S 2 . 

We show that the distributions on the databases S'^ and S '2 are similar in the sense that for 
each database in one of the distributions there exist a neighboring database in the other that have 
the same probability. Thus, applying the recursion (which is differentially private by the inductive 
assumption) preserves privacy. We now make this argument formal. 

First note that as S'i,S '2 differ in only one element, there is a bijection between orderings H 
and H of the smallest (n — 2 k) elements of 5i and of S '2 respectively s.t. Yf and I 2 are neighboring 
databases. This is because there exists a permutation of the smallest (n — 2k) elements of Si that is 
a neighbor of the smallest (n — 2 k) elements of S 2 ; composition with this fixed permutation yields 
the desired bijection. Moreover, note that whenever Yi,Y2 are neighboring databases, the same is 
true for S( and S' 2 . Hence, for every set of outputs F it holds that 

^The recursion depth is determined by |X|, which is identical in Si and in S' 2 . 
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Pr[i3(5) G F] 


= ^ Pr[n] • Pr[RecPrefix{S[) G P'ln] 
n 

< Pr[n] • Pv[RecPrefix{S2) G ^|n] + 2d{t - 1) 

n 

= • Y • Pr[RecPrefix{S2) G P|n] + 2d{t - 1) 

n 

= • Pr[S(5') £F]+ 26 {t - 1) 


So when executed for t recursive calls, the sequence of Steps 1-4 of RecPrefixis (2e(t—1), 25(t—1))- 
differentially private. On Steps 5 and 7, algorithm RecPrefix interacts with its database through the 
choosing mechanism and using the Laplace mechanism, each of which is (e, (5)-differentially private. 
By composition (Lemma 2.2), we get that RecPrefix is (2te, 2t(5)-differentially private. □ 

Combining Lemma 3.11 and Lemma 3.13 we obtain Theorem 3.4. 

3.3.3 Informal Discussion and Open Questions 

An natural open problem is to close the gap between our (roughly) 2*°® 1^1 upper bound on the sam¬ 
ple complexity of privately solving the interior point problem (Theorem 3.4), and our log* |X| lower 
bound (Theorem 3.2). Below we describe an idea for reducing the upper bound to poly(log* |A1|). 

In our recursive construction for the lower bound, we took n elements (xi,..., Xn) and generated 
n -|- 1 elements where yo is a random element (independent of the x* ’s), and every Xj is the length 
of the longest common prefix of yo and y*. Therefore, a change limited to one Xj affects only 
one Ui and privacy is preserved (assuming that our future manipulations on (yo,... ,yn) preserve 
privacy). While the representation length of domain elements grows exponentially on every step, 
the database size grows by 1. This resulted in the 14(log* |X|) lower bound. 

In RecPrefix on the other hand, every level of recursion shrank the database size by a factor 
of 25 and hence, we required a sample of (roughly) 2*°® 1^1 elements. Specifically, in each level 
of recursion, two input elements y 2 j-i,y 2 j were paired and a new element Zj was defined as the 
length of their longest common prefix. As with the lower bound, we wanted to ensure that a change 
limited to one of the inputs affects only one new element, and hence, every input element is paired 
only once, and the database size shrinks. 

If we could pair input elements twice then the database size would only be reduced additively 
(which will hopefully result in a poly(log* |A1|) upper bound). However, this must be done carefully, 
as we are at risk of deteriorating the privacy parameter e by a factor of 2 and thus remaining with 
an exponential dependency in log* |A"|. Consider the following thought experiment for pairing 
elements. 

Input: (xi,..., Xn) G X^. 

1. Let (y)*,..., y^) denote a random permutation of (xi,..., Xn). 

2. For t = 1 to log* |X|: 

For i = 1 to (n—t), let yj be the length of the longest common prefix of y*~^ 
and yj-l- 
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As (most of the) elements are paired twice on every step, the database size reduces additively. 
In addition, every input element Xi affects at most t +1 elements at depth t, and the privacy loss is 
acceptable. However, this still does not solve the problem. Recall that every iteration of RecPrefix 
begins by randomly shuffling the inputs. Specifically, we needed to ensure that (w.h.p.) the number 
of “close” pairs is limited. The reason was that if a “not close” pair agrees on a prefix L, then L is 
the prefix “a lot” of other elements as well, and we could privately identify L. In the above process 
we randomly shuffled only the elements at depth 0. Thus we do not know if the number of “close” 
pairs is small at depth t > 0 . On the other hand, if we changed the pairing procedure to shuffle at 
every step, then each input element x* might affect 2* elements at depth t, causing the privacy loss 
to deteriorate rapidly. 

4 Query Release and Distribution Learning 

4.1 Definitions 

Recall that a counting query g is a predicate q : X ^ {0,1}. For a database D = (xi,..., x^) G X^, 
we write q{D) to denote the average value of q over the rows of D, i.e. q{D) = ^ 
the query release problem, we seek differentially private algorithms that can output approximate 
answers to a family of counting queries Q simultaneously. 

Definition 4.1 (Query Release). Let Q be a collection of counting queries on a data universe 
X, and let a,/? > 0 be parameters. For a database D £ X^, a sequence of answers {oglggQ is 
a-accurate for Q if \aq — q{D)\ < a for every q £ Q. An algorithm A : A” —)■ is (a, /3)-accurate 
for Q if for every D £ A”, the output A{D) is a-accurate for Q with probability at least 1 — (3 over 
the coins of A. The sample complexity of the algorithm A is the database size n. 

We are interested in the query release problem for the class THRESH^ of threshold queries, which 
we view as a class of counting queries. 

We are also interested in the following distribution learning problem, which is very closely 
related to the query release problem. 

Definition 4.2 (Distribution Learning with respect to Q). Let Q be a collection of counting queries 
on a data universe A. Algorithm A is an (a, /I)-accurate distribution learner with respect to Q with 
sample complexity n if for all distributions T* on A, given an input of n samples D = (xi,..., x^) 
where each x* is drawn i.i.d. from T), algorithm A outputs a distribution V on A (specified by 
its PMF) satisfying dQ{V,V’) = sup^gg \ ¥,x,-^x>[q{x)\ — Ea,..^x>'['7(3;)]| < a with probability at least 
1-/3. 


We highlight two important special cases of the distance measure dg in the distribution learning 
problem. First, when Q is the collection of all counting queries on a domain A, the distance dg is 
the total variation distance between distributions, defined by 

d'T\{V,V') = sup I Pr [x G S'] — Pr [x G S']]. 

S(ZX X'^'D’ 

Second, when A is a totally ordered domain and Q = THRESH^, the distance dg is the Kolmogorov 
(or CDF) distance. A distribution learner in the latter case may as well output a CDF that 
approximates the target CDF in ^oo norm. Specifically, we define 
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Definition 4.3 (Cumulative Distribution Function (CDF)). Let D be a distribution over a totally 
ordered domain X. The CDF Fx> of V is defined by Fx>{t) = Pr^^x)[a: < t]. If X is finite, then any 
function F : X —>• [0,1] that is non-decreasing with F’(maxX) = 1 is a CDF. 

Definition 4.4 (Distribution Learning with respect to Kolmogorov distance). Algorithm A is an 
(a, /3)-accurate distribution learner with respect to Kolmogorov distance with sample complexity n if 
for all distributions D on a totally ordered domain X, given an input of n samples D = (xi ,..., Xn) 
where each Xi is drawn i.i.d. from V, algorithm A outputs a CDF F with sup^-gj^ |T(x) — Fx>{x)\ 
with probability at least 1 — (d. 

The query release problem for a collection of counting queries Q is very closely related to the 
distribution learning problem with respect to Q. In particular, solving the query release problem on 
a dataset D amounts to learning the empirical distribution of D. Conversely, results in statistical 
learning theory show that one can solve the distribution learning problem by first solving the 
query release problem on a sufficiently large random sample, and then fitting a distribution to 
approximately agree with the released answers. The requisite size of this sample (without privacy 
considerations) is characterized by a combinatorial measure of the class Q called the VC dimension: 

Definition 4.5. Let Q be a collection of queries over domain X. A set S = {xi,..., Xk} C X 
is shattered by Q if for every T C [k] there exists q ^ Q such that T = {i : q{xi) = 1}. The 
Vapnik-Chervonenkis (VC) dimension of Q, denoted VC(Q), is the cardinality of the largest set 
sex that is shattered by Q. 

It is known [AB09] that solving the query release problem on 256 VC((5) ln(48/a/3)/a^ random 
samples yields an {a, ;0)-accurate distribution learner for a query class Q. 

4.2 Equivalences with the Interior Point Problem 

4.2.1 Private Release of Thresholds vs. the Interior Point Problem 

We show that the problems of privately releasing thresholds and solving the interior point problem 
are equivalent. 

Theorem 4.6. Let X be a totally ordered domain. Then, 

1. If there exists an {e,S)-differentially private algorithm that is able to release threshold queries 
on X with {a, jd)-accuracy and sample complexity n/{8a), then there is an {£,6)-differentially 
private algorithm that solves the interior point problem on X with error fd and sample com¬ 
plexity n. 

2. If there exists an {e, 5)-differentially private algorithm solving the interior point problem on 
X with error afd/2^ and sample complexity m, then there is a (5e, (1 -|- e^)5)-differentially 
private algorithm for releasing threshold queries with {a, fd)-accuracy and sample complexity 

f 6m 251og(24//3) log^'®(6/Q;) 

n = max < -,- 

[a ae 

For the first direction, observe that an algorithm for releasing thresholds could easily be used 
for solving the interior point problem. Formally, 
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Proof of Theorem 4-6 item 1. Suppose ^ is a private (a, /3)-accurate algorithm for releasing thresh¬ 
olds over X for databases of size Define A! on databases of size n to pad the database with an 
equal number of min{X} and max{X} entries, and run A on the result. We can now return any 
point t for which the approximate answer to the query ct is (5 ± a) on the (padded) database. □ 

We now show the converse, i.e., that the problem of releasing thresholds can be reduced to the 
interior point problem. Specifically, we reduce the problem to a combination of solving the interior 
point problem, and of releasing thresholds on a much smaller data universe. The latter task is 
handled by the following algorithm. 

Lemma 4.7 ([DNPRIO]). For every finite data universe X, and n G N, e, /3 > 0, there is an 
e-differentially private algorithm A that releases all threshold queries on X with {a, f)-accuracy for 

41og(l//?)log2-5|X| 

a = -. 

en 

The idea of the reduction is to create noisy partitions of the input database into 0(1/q:) blocks 
of size roughly anj3. We then solve the interior point problem on each of these blocks, and think 
of the results as representatives for each block. By answering threshold queries on just the set 
of representatives, we can well-approximate all threshold queries. Moreover, since there are only 
0 {\/a) representatives, the base algorithm above gives only polylog(l/a) error for these answers. 

In Appendix C, we describe another reduction that, up to constant factors, gives the same 
sample complexity. 

Proof of Theorem 4-6 item 2. Let R : X* —)■ A be an (e, dj-differentially private algorithm solving 
the interior point problem on X with error a/3/24 and sample complexity m. We may actually 
assume that R is differentially private in the sense that if D G X* and D' differs from D up to the 
addition or removal of a row, then for every SOX, Pr[i?(D) G S'] < e^ Pr[R{D') G S] -|- (3, and 
that R solves the interior point problem with probability at least 1 — a/3/24 whenever its input is 
of size at least m. This is because we can pad databases of size less than m with an arbitrary fixed 
element, and subsample the first m entries from any database with size greater than m. 

Consider the following algorithm for answering thresholds on databases D G X^ for n > m: 
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Algorithm 4 Thresh{D) 

Input: Database D = (xi,..., Xn) £ X^. 

1. Sort D in nondecreasing order xi < X 2 < ■ ■ ■ < Xn- 

2 . Set k = Gja and let to = 1,= to + an/3 + i'i,t 2 = ti + an/3 + 1^2 ■ ■ ■ ,tk = tfc-i + an/3 + Uk 
where each I'l ~ Lap(l/e) independently. 

3. Divide D into blocks Di ,..., Dk, where Dg = , xtg-i) (setting Xj = max A if j > n; 

note some may be empty). 

4. Let ro = min A, ri = R{Di),... ,rk = R{Dk) and define D from D by replacing each Xj with 
the largest for which ri < xj. 

5. Run algorithm A from Lemma 4.7 on D over the universe {rg, ri,..., to obtain threshold 

query answers 0 ^, 3 , . Use privacy parameter e and confidence parameter /3/4. 

6 . Answer arbitrary threshold queries by interpolation, i.e. for r£ < t < r^+i, set at = Or^. 

7. Output {at)t£X- 


Privacy Let D = (xi,..., x„) where xi < X 2 < • • • < Xn, and consider a neighboring database 
D' = (xi,..., x',..., Xn)- Assume without loss of generality that x' > Xj+i, and suppose 

Xl < • • • < Xi-l < Xj +1 < ■ ■ ■ < Xj < x'i < Xj +1 < • • • < Xn- 

We write vectors of noise values as v = {i'i,V 2 -, - - - -,1'k)- There is a bijection between noise vectors 
V and noise vectors v' such that D partitioned according to v and D' partitioned according to v' 
differ on at most two blocks: namely, if r are the indices for which ti-i < i < ti and R-i < j <R 
(we may have i = r), then we can take t'/ = — 1 and = t',. + 1 with = u at every other 
index. Note that D partitioned into (Di,..., Dk) according to u differs from D' partitioned into 
(D[,..., D'f^) according to v' by a removal of an element from one block (namely D^) and the 
addition of an element to another block (namely D/). Thus, for every set S C A™, 

Pr[(R(Di),..., R{Dk)) €S\u]<e‘^^ Pr[(R(D;),..., R(D')) e 5 | z.'] + (1 + 

Moreover, under the bijection we constructed between u and v', noise vector u' is sampled with 
density at most times the density of ly, so for every set S C X^, 

Pr[(R(Di),..., R{Dk)) gS]< Pr[(R(D;),..., R(D/)) G 5]) + (1 + e^)<5 

= Pi[{R{D [),..., R{D'k)) G 5] + (1 + e^),5. 

Finally, the execution of A at the end of the algorithm is e-differentially private, so by composition 
(Lemma 2.2), we obtain the asserted level of privacy. 

Utility We can produce a-accurate answers to every threshold function as long as 
1. The partitioning exhausts the database, i.e. every element of D is in some Dt, 
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2. Every execution of R succeeds at finding an interior point, 

3. Every database Di has size at most banj 12 (ensuring that we have error at most 5a/6 from 
interpolation), 

4. The answers obtained from executing A all have error at most a/ 6 . 

We now estimate the probabilities of each event. Eor each i we have t'j > —anj6 with probability 
at least 

1 — exp(—ane/ 6 ) > 1 — a/3/24. 

So by a union bound, every i/j is at least (—an/6) with probability at least 1 — /3/4. If this is the 
case, then item 1 holds because tk = k ■ an/3 + ni + • • • + > (6/a)(an/3) + (6/a)(—an/6) > n. 

Moreover, if every Ui > —an/6, then item 2 also holds with probability at least 1 — /3/4. This 
is because every \Di\ > an/3 — an/6 > m, and hence every execution of i? on a subdatabase Di 
succeeds with probability 1 — a/3/24. 

By a similar argument, property 3 holds as long as each noise value is at most an/12, which 
happens with probability at least 1 —/3/4. Finally, property 4 holds with probability at least 1 —/3/4 
since 

24 

an > — log(4//3) log^'®(l + 6 /a). 

A union bound over the four properties completes the proof. □ 

4.2.2 Releasing Thresholds vs. Distribution Learning 

Query release and distribution learning are very similar tasks: A distribution learner can be viewed 
as an algorithm for query release with small error w.r.t. the underlying distribution (rather than 
the fixed input database). We show that the two tasks are equivalent under differential privacy. 

Theorem 4.8. Let Q he a collection of counting queries over a domain X. 

1. If there exists an (e, 5)-differentially private algorithm for releasing Q with (a, l3)-accuracy and 
sample complexity n > 256 VC(Q) ln(48/a/3)/a^, then there is an {e, 6)-differentially private 
(3a, 2/3)-accurate distribution learner w.r.t. Q with sample eomplexity n. 

2. If there exists an {e, 6)-differentially private {a, ft)-accurate distribution learner w.r.t. Q with 
sample complexity n, then there is an {£,5)-differentially private query release algorithm for 
Q with {a, ft)-accuracy and sample complexity 9n. 

The first direction follows from a standard generalization bound, showing that if a given database 
D contains (enough) i.i.d. samples from a distribution T), then (w.h.p.) accuracy with respect to 
D implies accuracy with respect to V. We remark that the sample complexity lower bound on n 
required to apply item 1 of Theorem 4.8 does not substantially restrict its applicability: It is known 
that an (e, (3)-differentially private algorithm for releasing Q always requires sample complexity 
n(VC(Q)/ae) anyway [BLR08]. 

Proof of Theorem item 1. Suppose A is an (e, <5)-differentially private algorithm for releasing 
Q with (a,/3)-accuracy and sample complexity n > 256 VC(Q) ln(48/a/3)/a^. Fix a distribution V 
over X and consider a database D containing n i.i.d. samples from V. Define the algorithm A that 
on input D runs M on D to obtain answers Oq for every query q £ Q. Afterwards, algorithm A 
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uses linear programming [DNR'*'09] to construct a distribution V that such that \aq — q{'D')\ < a 
for every q £ Q, where qiV) = Kxr^v'[q{x)]. This reconstruction always succeeds as long as the 
answers {aq} are a-accurate, since the empirical distribution of -D is a feasible point for the linear 
program. Note that A is (e, 5)-differentially private (since it is obtained by post-processing A). 

We first argue that q{V') is close to q{D) for every q £ Q, and then argue that q{D) is close to 
q{'D). By the utility properties of A, with all but probability 13, 

k(^') - q{D)\ < \q{V') - aq\ + \aq - q{D)\ < 2a. 


for every q £ Q. 

We now use the following generalization theorem to show that (w.h.p.) q{D) is close to (/(T*) 
for every q £ Q. 

Theorem 4.9 ([AB09]). Let Q be a collection of counting queries over a domain X. Let D = 
(xi,..., Xn) consist of i.i.d. samples from a distribution V over X. If d = VC(Q), then 


Pr 


sup\q{D) - q{V)\ > a 
q£Q 




Using the above theorem, together with the fact that n > 256 VC((5) ln(48/a/3)/a^, we see that 
except with probability at least 1 — /? we have that \q{D) — q{'D))\ < a for every q £ Q. By a union 
bound (and the triangle inequality) we get that A is (3a, 2/3)-accurate. □ 


In the special case where Q = THRESHx for a totally ordered domain X, corresponding to distri¬ 
bution learning under Kolmogorov distance, the above theorem holds as long as n > 21n(2//3)/a^. 
This follows from using the Dvoretzky-Kiefer-Wolfowitz inequality [DKW56, Mas90] in place of 
Theorem 4.9. 


Theorem 4.10. If there exists an {£,6)-differentially private algorithm for releasing THRESHx over 
a totally ordered domain X with {a, (I)-accuracy and sample complexity n > 21n(2//3)/a^, then there 
is an {s, 5)-dijferentially private {2a, 2(3)-accurate distribution learner under Kolmogorov distance 
with sample complexity n. 

We now show the other direction of the equivalence. 

Lemma 4.11. Suppose A is an {£,6)-differentially private and {a, (3)-accurate distribution learner 
w.r.t. a concept class Q with sample complexity n. Then there is an {£,6)-differentially private 
algorithm A for releasing Q with {a, (3)-accuracy and sample complexity 9n. 

To construct the algorithm A, we note that a distribution learner must perform well on the 
uniform distribution on the rows of any fixed database, and thus must be useful for releasing 
accurate answers for queries on such a database. Thus if we have a distribution learner A, the 
mechanism A that samples m rows (with replacement) from its input database D £ {X x {0,1})"' 
and runs A on the result should output accurate answers for queries with respect to D. The random 
sampling has two competing effects on privacy. On one hand, the possibility that an individual 
is sampled multiple times incurs additional privacy loss. On the other hand, if n > m, then a 
“secrecy-of-the-sample” argument shows that random sampling actually improves privacy, since 
any individual is unlikely to have their data affect the computation at all. We show that if n is 
only a constant factor larger than m, these two effects offset, and the resulting mechanism is still 
differentially private. 
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Proof. Consider a database D G X^'^. Let T) denote the nniform distribution over the rows of D, and 
let P' be the distribution learned. Consider the algorithm A that subsamples (with replacement) n 
rows from D and runs A on it to obtain a distribution T>'. Afterwards, algorithm A answers every 
threshold query q £ Q with Og = qiV) = E 3 ;..^x)/[g(x)]. 

Note that drawing n i.i.d. samples from P is equivalent to subsampling n rows of D (with 
replacement). Then with probability at least 1 — /3, the distribution P' returned by A is such that 
for every x G X 

\q{p’)-qm = \qi'i^')-qm<o^, 

showing that A is (a, /3)-accurate. □ 

We’ll now use a secrecy-of-the-sample argument (refining an argument that appeared implicitly 
in [KLN'^11]), to show that A (from Lemma 4.11) is differentially private whenever A is differentially 
private. 

Lemma 4.12. Fix e < 1 and let A he an {e, 6)-differentially private algorithm operating on 
databases of size m. For n > 2m, construct an algorithm A that on input a database D of size n sub¬ 
samples (with replacement) m rows from D and runs A on the result. Then A is (e, 5)-differentially 
private for 

~ 4 ?? 7 - 

e = Qemfn and 6 = exp(6em/n)-• 6. 

n 

Proof. Let D, D' be adjacent databases of size n, and suppose without loss of generality that they 
differ on the last row. Let T be a random variable denoting the multiset of indices sampled by 
A, and let i{T) be the multiplicity of index n in T. Fix a subset S of the range of A. For each 
A: = 0,1,..., m let 


Pk = Pr[£(T) = k] = - 1/n)—" = Q (n - !)-'=(! - 1/n)-, 

qk = PT[A{D\T)£S\i{T) = k\, 
q'^ = PT[A{D'\T)£S\m = k]. 

Here, D\t denotes the subsample of D consisting of the indices in T, and similarly for D'\t- Note 
that qo = qQ, since D\t = D'\t if index n is not sampled. Our goal is to show that 

m m 

Pr[A(D) £ S] = '^Pkqk < e*" '^Pkq'k + 5 = e^PT[A{D') G S] + h. 

/c=0 k=0 

To do this, observe that by privacy, qk < e^qk-i + d, so 

1 . - 1 

qk < e '"go + ^- :rd. 

e^ — 1 
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Hence, 


Fr[A{D) £S] = 


m 

'^Pkqk 
k=^ 
m 


< 


fc=0 ^ ^ ^ ^ 


*(l-l/nrX^ 




/c=0 



k ) V n — 1 


\ rn 


+ 


-(l-l/n)”^ 




k=0 



k ) V n — 1 


- 1 


®(l-l/nr(l+^) +^(l-l/nr(l + ^ 

\ n — 1/ e^ — 1 \ n — 1 

go 1 - - + - + -'^• 

V nn — 1 


~ - 1 


( 1 ) 


Similarly, we also have that 


Pr[^(D') eS]>qo{l-- + —) - 

\ n n J 

Combining inequalities 1 and 2 we get that 

Fr[A{D)eS] < . I Pr[j:(Z)') G 5] + 

proving that A! is (e, <5)-differentially private for 

e=-l \ 


I e-ex'" (1-:!^ + ^) -1 


n n 


e-^ - 1 


1 _ ( 1 _ 1 + 

* n n 


6 . 


( 2 ) 


1 — e" 


-5 > + ^ 

— 1 


/ 1 + 

e < m • In ' " 


1 + 


e-®-l 


< 


6 em 


n 


and 


6 < exp(6em/n)- 


1-1 + 


e-^-l 


< exp(6em/n)- 


1 — exp 2 


1 - e-= 

e-^-l 

nim 




1 - e-= 

2m ^ 2m ^ 

< exp(bem/n) • o H-• o 

n n 

4?77- 

< exp(6em/n) • 5. 

n 


5 + 


- 1 

exp [^'] - 1 


nIm 


- 1 


•<5 


□ 
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5 PAC Learning 


5.1 Definitions 

A concept c : X —)• {0,1} is a predicate that labels examples taken from the domain X. A concept 
class C over X is a set of concepts over the domain X. A learner is given examples sampled from 
an unknown probability distribution T) over X that are labeled according to an unknown target 
concept c £ C and outputs a hypothesis h that approximates the target concept with respect to 
the distribution T>. More precisely, 

Definition 5.1. The generalization error of a hypothesis /i : X —)• {0,1} (with respect to a target 
concept c and distribution V) is defined by errorx)(c, h) = Vixr^x>[h{x) ^ c(x)]. If errorx)(c, h) < a 
we say that h is an a-good hypothesis for c on D. 

Definition 5.2 (PAC Learning [Val84]). Algorithm A is an (a, /d)-accurate PAC learner for a con¬ 
cept class C over X using hypothesis class H with sample complexity m if for all target concepts 
c £ C and all distributions V on X, given an input of m samples D = ((xj, c(xj)),..., {xm, c(xm))), 
where each Xi is drawn i.i.d. from V, algorithm A outputs a hypothesis h £ H satisfying Pr[errorx)(c, h) < 
a] > 1 — /d. 

The probability is taken over the random choice of the examples in D and the coin tosses of 
the learner A. If H <£ C then A is called proper, otherwise, it is called improper. 

Definition 5.3. The empirical error of a hypothesis h on a labeled sample S = ((xi, £i),..., (xm, (-m)) 
is erioisih) = ^\{i '■ h{xi) ^ ii}\. If oiioTs{h) < a we say h is a-consistent with S. 

Classical results in statistical learning theory show that a sample of size 0(VC(C')) is both 
necessary and sufficient for PAC learning a concept class C. That 0(VC(C')) samples suffice 
follows from a “generalization” argument: for any concept c and distribution T>, with probability 
at least 1 — (d over m > Oa,i 3 (VC{C)) random labeled examples, every concept h £ C that agrees 
with c on the examples has error at most a on P. Therefore, C can be properly learned by finding 
any hypothesis h £ C that agrees with the given examples. 

Recall the class of threshold functions, which are concepts defined over a totally ordered domain 
X by THRESHx = {cx ■ x £ X} where Cx{y) = 1 iff y < x. The class of threshold functions has VC 
dimension VC(THRESHx) = 1, and hence can be learned with Oa,/ 3 (f) samples. 

A private learner is a PAC learner that is differentially private. Following [KLN'*'ll], we consider 
algorithms A : (X x {0,1})"* —?• H, where iL is a hypothesis class, and require that 

1. A is an (a,/3)-accurate PAC learner for a concept class C with sample complexity m, and 

2. A is (e, (5)-differentially private. 

Note that while we require utility (PAC learning) to hold only when the database D consists 
of random labeled examples from a distribution, the requirement of differential privacy applies to 
every pair of neighboring databases D ^ D', including those that do not correspond to examples 
labeled by any concept. 

Recall the relationship between distribution learning and releasing thresholds, where accuracy 
is measured w.r.t. the underlying distribution in the former and w.r.t. the fixed input database in 
the later. Analogously, we now define the notion of an empirical learner which is similar to a PAC 
learner where accuracy is measured w.r.t. the fixed input database. 
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Definition 5.4 (Empirical Learner). Algorithm A is an [a, P)-accurate empirical learner for a 
concept class C over X using hypothesis class H with sample complexity m if for every c £ C 
and for every database D = ((xj, c(xi)),..., {xm, c{xm))) £ {X x {0,1})”^ algorithm A outputs a 
hypothesis h ^ H satisfying Pr[error£)(c, h) < a] > 1 — p. 

The probability is taken over the coin tosses of A. 

Note that without privacy (and ignoring computational efficiency) identifying a hypothesis with 
small empirical error is trivial for every concept class C and for every database of size at least 1. 
This is not the case with (e, J)-differential privacy,^ and the sample complexity of every empirical 
learner for a concept class C is at least fl(VC(C')): 

Theorem 5.5. For every a, (3 < 1/8, every S < -^ and e > 0, if A is an (e, S)-differentially private 
{a, (3)-accurate empirical learner for a class C with sample complexity n, then n = Q (^ VC(C')). 

The proof of Theorem 5.5 is very similar the analysis of [BLR08] for lower bounding the sample 
complexity of releasing approximated answers for queries in the class C. As we will see in the next 
section, at least in some cases (namely, for threshold functions) the sample complexity must also 
have some dependency in the size of the domain X. 

Proof of Theorem 5.5. Fix d < VC(C'), let xo,xi,X 2 , ■.. ,Xci be shattered by C, and denote S = 
{xi,..., Xd}. Let D denote a database containing (1 — 8a)n copies of xq and 8an/d copies of every 
Xi G S. For a concept c we use Dc to denote the database D labeled by c. We will consider concepts 
that label xq as 0, and label exactly half of the elements in S as 1. To that end, initiate (7 = 0, 
and for every subset S" C S' of size |S'| = |S|/2, add to (7 one concept c G (7 s.t. c(xo) = 0 and for 
every x, G 5 it holds that c(xj) = 1 iff x* G 5' (such a concept exists since S U {xq} is shattered by 

Now, let c G (7 be chosen uniformly at random, let x G 5 be a random element s.t. c(x) = 1, 
and let ?/ G S' be a random element s.t. c{y) = 0. Also let c' G (7 be s.t. c'(x) = 0, c'{y) = 1, and 
c'(xi) = c(xj) for every x* G 5 \ {x,y}. Note that the marginal distributions on c and on F are 
identical, and denote h = A{Dc) and h' = A{D(.i). 

Observe that x is a random element of S that is labeled as 1 in and that an a-consistent 
hypothesis for Dc must label at least (1 — ^)d such elements as 1. Hence, by the utility properties 
of 7l, we have that 

Pr[/i(x) = !]>(!- /3)(1 - 1/8) > 3/4. 

Similarly, x is a random elements of S' that is labeled as 0 in Dc', and an a-consistent hypothesis 
for Dc' must not label more than d/8 such elements as 1. Hence, 

Pr[h'(x) = !]</? +(1-/3)^ <1/4. 

Finally, as Dc and Dc' differ in at most IQan/d entries, differential privacy ensures that 
3/4 < Pr[/i(x) = 1] < • Pr[/i'(x) = 1] + • lQan5/d < . 1 / 2 , 

showing that n > □ 

^The lower bound in Theorem 5.5 also holds for label private empirical learners, that are only required to provide 
privacy for the labels in the database. 
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5.2 Private Learning of Thresholds vs. the Interior Point Problem 

We show that with differential privacy, there is a ©(l/a) multiplicative relationship between the 
sample complexities of properly PAG learning thresholds with (a, /3)-accuracy and of solving the 
interior point problem with error probability 0(/3). Specifically, we show 

Theorem 5.6. Let X he a totally ordered domain. Then, 

1. If there exists an {e, 5)-differentially private algorithm solving the interior point problem 
on X with error probability P and sample complexity n, then there is a (2e, (1 + e^)5)- 
differentially private {2a,2l3)-aceurate proper PAC learner for THRESHx with sample com¬ 
plexity max I ^I. 

2. If there exists an {s,S)-dijferentially private {a, 13)-accurate proper PAC learner for THRESH^ 
with sample complexity n, then there is a (2e, (1 + e^)5)-differentially private algorithm that 
solves the interior point problem on X with error (3 and sample eomplexity 27an. 

We show this equivalence in two phases. In the first, we show a 0(l/a) relationship between the 
sample complexity of solving the interior point problem and the sample complexity of empirically 
learning thresholds. We then use generalization and resampling arguments to show that with 
privacy, this latter task is equivalent to learning with samples from a distribution. 

Lemma 5.7. Let X be a totally ordered domain. Then, 

1. If there exists an (e, 5)-differentially private algorithm solving the interior point problem on X 
with error probability j3 and sample eomplexity n, then there is a {2s, (1 + e^)5)-dijferentially 
private algorithm for properly and empirically learning thresholds with {a, 13)-accuracy and 
sample complexity nj{2a). 

2. If there exists an {e, 6)-differentially private algorithm that is able to properly and empirically 
learn thresholds on X with {a, /I)-accuracy and sample complexity n/{3a), then there is a 
(2e, {l-\-e^)6)-differentially private algorithm that solves the interior point problem on X with 
error j3 and sample complexity n. 

Proof. For the first direction, let ^ be a private algorithm for the interior point problem on 
databases of size n. Consider the algorithm Jf that, on input a database D of size njifla), runs Jf 
on a database D' consisting of the largest n/2 elements of D that are labeled 1 and the smallest 
n/2 elements of D that are labeled 0. If there are not enough of either such element, pad D' with 
min{X}’s or max{X}’s respectively. Note that if x is an interior point of D' then Cx is a threshold 
function with error at most ^^(^ 20 ) hence a-consistent with D. For privacy, note that 

changing one row of D changes at most two rows of D'. Hence, applying algorithm A preserves 
{2s, (e^ + l)(5)-differential privacy. 

For the reverse direction, suppose A! privately finds an a-consistent threshold functions for 
databases of size n/(3a). Define .4 on a database D' G X” to label the smaller n/2 points 1 and 
the larger n/2 points 0 to obtain a labeled database D G {X x {0,1})”, pad D with an equal number 
of (min{X}, 1) and (max{X},0) entries to make it of size n/{3a), and run A! on the result. Note 
that if Cx is a threshold function with error at most a on D then x is an interior point of D', as 
otherwise Cx has error at least « on D. For privacy, note that changing one row of D' 

changes at most two rows of D. Hence, applying algorithm A! preserves (2e, (e^ + l)(I)-differential 
privacy. □ 
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Now we show that the task of privately outputting an almost consistent hypothesis on any 
fixed database is essentially equivalent to the task of private (proper) PAC learning. One direction 
follows immediately from a standard generalization bound for learning thresholds: 

Lemma 5.8. Any algorithm A for empirically learning THRESH^ with [a, (3)-accuracy is also a 
{2a, ft + f3')-accurate PAC learner for THRESHx when given at least max{n,41n(2//3')/Q!} samples. 

Proof. Let P be a distribution over a totally ordered domain X and fix a target concept c = 
Qx G THRESHx- It suffices to show that for a sample S = {{xi, c{xi)),... {xm, ci^m))) where m > 
41n(2//3')/a and the xt are drawn i.i.d. from P, it holds that 

Pi [3 h G C : eiioix>{h, c) > 2a A eiiois{h) < a] < /3^ 

Let < X be the largest point with eiioix>{qx-, c) > 2a. If some y < x has eiioix>{qy, c) > 2a 
then y < x~, and hence for any sample S, eiiois{qx-) < ^^^ois{qy). Similarly let x"*" > x be the 
smallest point with eiioix>{qx+, c) > 2a. Let c~ = qx- and c'^ = qx+. Then it suffices to show that 

Pr [errors(c“) < a V errors(c^) < a] < /3^ 

Concentrating first on c~, we define the error region R~ = (x“,x] n X as the interval where c~ 
disagrees with c. By a Chernoff bound, the probability that after m independent samples from P, 
fewer than am appear in R~ is at most exp(—am/4) < (5'/2. The same argument holds for c"*", so 
the result follows by a union bound. □ 

In general, an algorithm that can output an a-consistent hypothesis from concept class C can also 
be used to learn C using max{n, 64 VC(C) log(512/a/3')/a} samples [BEHW89]. The concept class 
of thresholds has VC dimension 1, so the generalization bound for thresholds saves an 0(log(l/a)) 
factor over the generic statement. 

For the other direction, we note that a distribution-free learner must perform well on the uniform 
distribution on the rows of any fixed database, and thus must be useful for outputting a consistent 
hypothesis on such a database. 

Lemma 5.9. Suppose A is an (e, 5)-differentially private (a, l3)-accurate PAC learner for a concept 
class C with sample complexity m. Then there is an {e,S)-differentially private {a, f)-accurate 
empirieal learner for C with sample complexity n = 9m. Moreover, if A is proper, then so is the 
resulting empirieal learner. 

Proof. Consider a database D = {{xi,yi)} G {X x {0,1})”". Let P denote the uniform distribution 
over the rows of D. Then drawing m i.i.d. samples from P is equivalent to subsampling m rows 
of D (with replacement). Consider the algorithm A that subsamples (with replacement) m rows 
from D and runs .A on it. Then with probability at least 1 — /3, algorithm A outputs an a-good 
hypothesis on P, which is in turn an a-consistent hypothesis for D. Moreover, by Lemma 4.12 
(secrecy-of-the-sample), algorithm A is (e, (5)-differentially private. □ 

6 Thresholds in High Dimension 

We next show that the bound of n(log* |X|) on the sample complexity of private proper-learners 
for THRESHx extends to conjunctions of £ independent threshold functions in £ dimensions. As we 
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will see, every private proper-learner for this class requires a sample of ■ log* |X|) elements. 
This also yields a similar lower bound for the task of query release, as in general an algorithm for 
query release can be used to construct a private learner. 

The significance of this lower bound is twofold. First, for reasonable settings of parameters 
(e.g. 8 is negligible and items in X are of polynomial bit length in n), our n(log* |X|) lower bound 
for threshold functions is dominated by the dependence on log(l/5) in the upper bound. However, 
a. ■ log* \X\ can still be much larger than log(l/J), even when 8 is negligible in the bit length of 
items in X^. Second, the lower bound for threshold functions only yields a separation between the 
sample complexities of private and non-private learning for a class of VC dimension 1. Since the 
concept class of ^-dimensional thresholds has VC dimension of £, we obtain an cu(VC(C')) lower 
bound for concept classes even with arbitrarily large VC dimension. 

Consider the following extension of THRESHx to (. dimensions. 

Definition 6.1. For a totally ordered set X and a = (ai,...,a£) G X^ define the concept cg : 
X^ —)• {0,1} where c^(x) = 1 if and only if for every 1 < i < I \i holds that xi < ai. Define the 
concept class of all thresholds over X^ as THRESH^ = 

Note that the VC dimension of THRESH^ is We obtain the following lower bound on the 
sample complexity of privately learning THRESH^. 

Theorem 6.2. For every n, £ G N, a > 0, and 8 < j (1500re^), any (e = ^, 8) -differentially private 
and {a, fd = ^)-accurate proper learner for THRESH^ requires sample complexity n = log* | V|). 

This is the result of a general hardness amplification theorem for private proper learning. We 
show that if privately learning a concept class C requires sample complexity n, then learning the 
class of conjunctions of I different concepts from C requires sample complexity Q{in). 

Definition 6.3. For G N, a data universe X and a concept class C over X, define a concept 
class over X^ to consist of all c = (ci,...,C£), where c : X^ —>■ {0,1} is defined by c{x) = 
Cl(xi) A C2ix2) A • • • A Ci{Xi). 

Theorem 6.4. Let a,j3,e,8 > 0. Let C be a concept class over a data universe X, and assume 
there is a domain element pi € X s.t. c{pi) = 1 for every c € C. Let F be a distribution over 
databases containing n examples from X labeled by a concept in C, and suppose that every {e,8)- 
differentially private algorithm fails to find an {a/jd)-consistent hypothesis h C for D ^ F with 
probability at least 2fd. Then any {e, 8)-differentially private and {a, (d)-accurate proper learner for 

requires sample complexity Q{£n). 

Note that in the the above theorem we assumed the existence of a domain element pi G V on 
which every concept in C evaluates to 1. To justify the necessity of such an assumption, consider 
the class of point functions over a domain X defined as POINTx = {cx : x G X} where Cx{y) = 1 
iff y = X. As was shown in [BNS13b], this class can be privately learned using Oa,/ 3 ,e, 5 (l) labeled 
examples (i.e., the sample complexity has no dependency in |A|). Observe that since there is no 
X G X on which every point concept evaluates to 1, we cannot use Theorem 6.4 to lower bound 
the sample complexity of privately learning POINT^. Indeed, the class POINT^ is identical (up to 
renaming of domain elements) to the class POINTx^, and can be privately learned using Oa,i3,e,s{^) 
labeled examples. 
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Remark 6.5. Similarly to Theorem 6.4 it can be shown that if privately learning a concept class C 
requires sample complexity n, and if there exists a domain element po G X s.t. c{po) = 0 for every 
c G C, then learning the class 0 /disjunctions of i concepts from C requires sample complexity in. 

Proof of Theorem 6.4- Assume toward a contradiction that there exists an (e, 5)-differentially pri¬ 
vate and (a,/3)-accurate proper learner A for using in/9 samples. Recall that the task of 
privately outputting a good hypothesis on any fixed database is essentially equivalent to the task of 
private PAC learning (See Section 5.2). We can assume, therefore, that A outputs an a-consistent 
hypothesis for every fixed database of size at least n' = in with probability at least 1 — /3. 

We construct an algorithm Solves which uses A in order to find an (a//3)-consistent threshold 
function for databases of size n from T. Algorithm Solvex> takes as input a set of n labeled examples 
in X and applies ^ on a database containing n' labeled examples in X^. The n input points are 
embedded along one random axis, and random samples from T) are placed on each of the other axes 
(with n labeled points along each axis). 


Algorithm 5 Solves 

Input: Database D = {xi,yi)/^i G {X x {0,1})"'. 

1 . Initiate S as an empty multiset. 

2. Let r be a (uniform) random element from {1, 2,..., £}. 

3. For i = 1 to n, let i) G X^ be the vector with coordinate Xj, and all other coordinates 
Pi (recall that every concept in C evaluates to 1 on pi). Add to S the labeled example 

4. For every axis t 7 ^ r: 

(a) Let D' = (x',y'))L^ G {X x {0,1})" denote a (fresh) sample from V. 

(b) For i = 1 to n, let z[ G X^ be the vector whose coordinate is and its other 

coordinates are pi. Add to S the labeled example {z[,y[). 

5. Let (hi, ^ 2 ,..., hi) = h G- A(5). 

6 . Return h^. 


First observe that Solves is (e, (5)-differentially private. To see this, note that a change limited 
to one input entry affects only one entry of the multiset S. Hence, applying the (e, (5)-differentially 
private algorithm A on S' preserves privacy. 

Consider the execution of Solves on a database D of size n, sampled from V. We first argue 
that A is applied on a multiset S correctly labeled by a concept from C^. For 1 < t < £ let 
(x*,y})f^i be the sample from V generated for the axis t, let {zl,y\)'/^i denote the corresponding 
elements that were added to S, and let ct be s.t. ct{x\) = y\ for every 1 < i < n. Now observe that 

(ci,C2,... ,q)( 4) = ci(pi) A C2(pi) A • • • A ct(x-) A • • • A q(pi) = ?/•, 

and hence S is perfectly labeled by (ci, C 2 ,..., q) G . 

By the properties of A, with probability at least 1 — /3 we have that h (from Step 5) is an 
a-consistent hypothesis for S. Assuming that this is the case, there could be at most j3i “bad” 
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axes on which h errs on more than anj(3 points. Moreover, as r is a random axis, and as the points 
aiong the axis are distributed exactiy iike the points aiong the other axes, the probabiiity that 
r is a “bad” axis is at most ^ = (3. Overaii, Solvex> outputs an (a//3)-consistent hypothesis with 
probabiiity at ieast (1 — /3)^>1 — 2/3. This contradicts the hardness of the distribution D. □ 

Now the proof of Theorem 6.2 foiiows from the iower bound on the sampie compiexity of 
privateiy finding an a-consistent threshoid function (see Section 3.2): 

Lemma 6.6 (Foiiows from Lemma 3.3 and 5.7). There exists a constant A > 0 s.t. the follow¬ 
ing holds. For every totally ordered data universe X there exists a distribution T) over databases 
containing at most n = ^ iog* |X| labeled examples from X such that every -differentially 

private algorithm fails to find an a-consistent threshold function for D ^ T> with probability at least 
1 

4 - 

We remark that, in generai, an aigorithm for query reiease can be used to construct a private 
iearner with simiiar sampie compiexity. Hence, Theorem 6.2 aiso yieids the foiiowing iower bound 
on the sampie compiexity of reieasing approximated answers to queries from THRESH^. 

Theorem 6.7. For every n,£ G N, a > 0, and 6 < £^/(7500n^), any {^,S)-differentially private 
algorithm for releasing approximated answers for queries from THRESH^ with (a, j^)-accuracy must 
have sample complexity n = iog* |X|). 

In order to prove the above theorem we use our iower bound on privateiy iearning THRESH^ 
together with the foiiowing reduction from private iearning to query reiease. 

Lemma 6.8 ([GHRUll, BNS13b]). Let C be a class of predicates. If there exists a 
differentially private algorithm capable of releasing queries from C with accuracy and 

sample complexity n, then there exists a (^,55)-differentially private (|, |)-accurate PAC learner 
for C with sample complexity 0{n). 

Proof of Theorem 6. 7. Let 5 < f (7500n^). Combining our iower bound on the sampie compiexity 
of privateiy iearning THRESH^ (Theorem 6.2) together with the reduction stated in Lemma 6.8, we 
get a iower bound of m = Ll{£ ■ iog* |X|) on the sampie compiexity of every ((5)-differentiaiiy 
private aigorithm for reieasing queries from THRESH^ with (j^, j^)-accuracy. 

In order to refine this argument and get a bound that incorporates the approximation parameter, 
let a < 1/150, and assume towards contradiction that there exists a (h)-differentially private 
algorithm A for releasing queries from THRESH^ with (a, j^)-accuracy and sample complexity 
n < m/ (150a). 

We will derive a contradiction by using A in order to construct a (j^, j^)-accurate algorithm 
for releasing queries from THRESH^ with sample complexity less than m. Consider the algorithm 
A that on input a database D of size 150an, applies M on a database D containing the elements 
in D together with (1 — 150a)n copies of (minX). Afterwards, algorithm A answers every query 
c G THRESH^ with Oc = — 1 + 150a), where {oc} are the answers received from A. 

Note that as A is ((5)-differentially private, so is A. We now show that M’s output is 
accurate for D whenever M’s output is a-accurate for D, which happens with all but probability 
Fix a query c G THRESH^ and assume that c{D) = t/(150an). Note that c(minX) = 1, and 
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hence, c{D) =t/n + {1 — 150a). By the utility properties of A, 


CLc — 


< 


1 

150a 

1 

150a 

1 


{dc - 1 + 150a) 

{c{D) + a — 1 + 150a) 
(t/n + a) 


150a 
t/(150an) + 1/150 

c{D) + 1/150. 


Similar arguments show that Uc > c{D) — 1/150, proving that A is (1/150, l/150)-accurate and 
contradicting the lower bound on the sample complexity of such algorithms. □ 


7 Mechanism-Dependent Lower Bounds 

7.1 The Undominated Point Problem 

By a reduction to the interior point problem problem, we can prove an impossibility result for 
the problem of privately outputting something that is at least the minimum of a database on an 
unbounded domain. Specifically, we show 

Theorem 7.1. For every (infinite) totally ordered domain X with no maximum element (e.g., 
X = N/ and every n £ N, there is no {e, 6)-differentially private mechanism M : —)■ X such 

that for every x = (xi,..., Xn) £ X^, 

Pi[M{x) > minxj] > 2/3. 

i 

Besides being a natural relaxation of the interior point problem, this undominated point problem 
is of interest because we require new techniques to obtain lower bounds against it. Note that if 
we ask for a mechanism that works over a bounded domain (e.g., [0,1]), then the problem is 
trivial. Moreover, this means that proving a lower bound on the problem when the domain is N 
cannot possibly go by way of constructing a single distribution that every differentially private 
mechanism fails on. The reason is that for any distribution F over N”, there is some number K 
where [™ax U > K] < so the mechanism that always outputs K solves the problem. 

Proof. Without loss of generality we may take X = N, since every totally ordered domain with 
no maximum element contains an infinite sequence xq < xi < X 2 < X 3 < .... To prove our lower 
bound we need to take advantage of the fact that we only need to show that for each differentially 
private mechanism M there exists a distribution, depending on M, over which M fails. To this 
end, for an increasing function T : N —)■ N, we say that a mechanism M : N” —)■ N is “T-bounded” 
if Pr[M(xi,..., Xn) > T(maxiXi)] < 1/8. That is, M is T-bounded if it is unlikely to output 
anything larger than T applied to the max of its input. Note that any mechanism is T-bounded 
for some function T. 

We can then reduce the impossibility of the undominated point problem for T-bounded mech¬ 
anisms to our lower bound for the interior point problem. First, fix a function T. Suppose for the 
sake of contradiction that there were a T-bounded mechanism M that solves the undominated point 
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problem on (xi,... ,Xn) with probability at least 7/8. Then by a union bound, M must output 
something in the interval [minj Xj, T(maxi x*)) with probability at least 3/4. Now, for d G N, con¬ 
sider the data universe Xd = {1,T(l),T(T(1)),T(T(T(1))),...,and the differentially 
private mechanism M' : —)• Xd that, on input a database D runs M{D) and rounds the answer 

down to the nearest T^{d). Then M' solves the interior point problem on the domain Xd with 
probability at least 3/4. By our lower bound for the interior point problem we have n = n(log* d), 
which is a contradiction since n is fixed and d is arbitrary. 

□ 


7.2 Properly Learning Point Functions with Pure Differential Privacy 

Using similar ideas as in the above section, we revisit the problem of privately learning the concept 
class POINTn of point functions over the natural numbers. Recall that a point function Cx is 
defined by Ca;(y) = 1 if x = y and evaluates to 0 otherwise. Beimel et al. [BKNIO] used a 
packing argument to show that POINTp^ cannot be properly learned with pure e-differential privacy 
(i.e., (5=0). However, more recent work of Beimel et al. [BNS13a] exhibited an e-differentially 
private improper learner for POINTpj with sample complexity 0(1). Their construction required 
an uncountable hypothesis class, with each concept being described by a real number. This left 
open the question of whether POINTpj could be learned with a countable hypothesis class, with each 
concept having a finite description length. 

We resolve this question in the negative. Specifically, we show that it is impossible to learn 
(even improperly) point functions over an infinite domain with pure differential privacy using a 
countable hypothesis class. 

Theorem 7.2. Let X he an infinite domain, let H he a countable collection of hypotheses {h : X ^ 
{0,1}}, and let e > 0. Then there is no e-differentially private (1/3,1/3)-accurate PAC learner for 
points over X using the hypothesis class H. 

Remark 7.3. A learner implemented by an algorithm (i.e. a probabilistic Turing machine) must 
use a hypothesis class where each hypothesis has a finite description. Note that the standard proper 
learner for POINTx can be implemented by an algorithm. However, a consequence of our result is 
that there is no algorithm for privately learning POINTx. 

Proof. For clarity, and without loss of generality, we assume that W = N. Suppose for the sake 
of contradiction that we had an e-differentially private learner M for point functions over N using 
hypothesis class H. Since H is countable, there is a finite subset of hypotheses H' such that 
M((0,1)”) G H' with probability at least 5/6, where (0,1)” is the dataset where all examples are 
the point 0 with the label 1. Indeed — h] = 1, so some finite partial sum of 

this series is at least 5/6. Now to each point x G N we will associate a distribution Px on N and 
let Gx U H' be the set of hypotheses h in the finite set H' for which 

Pr \cx{y) = h{y)] > 2/3. 

y^T>x 


We establish the following claim. 

Claim 7.4. There is an infinite sequence of points xi,X 2 ,X 3 ,... together with distributions Vi := 
Vxi such that the sets Gi := Gxi are all disjoint. 
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Given the claim, the result follows by a packing argument [HTIO, BKNIO]. By the utility of 
M, for each Vi there is a database G (N x {0,1})” in the support of V^ such that Pr[M(i?i) G 
Gi] > 2/3 — 1/6 = 1/2. By changing the database Ri to (0,1)” one row at a time while applying 
the differential privacy constraint, we see that 

Pr[M((0 ,inGG,]>^e-^". 

It is impossible for this to hold for infinitely many disjoint sets Gi. □ 

Proof of Claim l.f. We inductively construct the sequence (xj), starting with xi = 0. Now suppose 
we have constructed xi,..., x* with corresponding good hypothesis sets Gi,..., Gj. Let B = 
be the set of hypotheses with wish to avoid. Note that B is a finite set of hypotheses, so there are 
some x,x' G N for which every h ^ B with h{x) = 1 also has /i(x') = 1. Let Xj+i = x and Vi be 
distributed uniformly over x and x'. Then for every hypothesis h £ B, 

Pr [cxi+Av) = Kv)] < 1/2, 

y^Di 

and hence Gj+i is disjoint from the preceding G/s. □ 
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A The Choosing Mechanism 


We supply the proofs of privacy and utility for the choosing mechanism. 

Proof of Lemma 3.6. Let A denote the choosing mechanism (Algorithm 2). Let S, S' be neighboring 
databases of m elements. We need to show that Pr[A(S') G 72] < exp(e) • Pr[A(S'') G 72] + 5 for 
every set of outputs 72 C u {-L}. Note first that OPT(S) = maxjg_ 7 r{g( 5 ,/)} has sensitivity at 
most 1, so by the properties of the Laplace Mechanism, 


Pr[A(5) = A] = 


< 


Pr 


OPT(5)<-ln(i^] 
e peo 


exp(-)• Pr 


'6pT(S') < bn(A; 


exp(-) • Pr[A(5') = A]. 


( 3 ) 


Similarly, we have Pr[A(5) 7 ^ A] < exp(e/4) Pr[A(S'') 7 ^ A]. Thus, we my assume below that 
A 0 72. (If A G 72, then we can write Pr[A(5) G 72] = Pr[A(S') = A] + Pr[A(S') G 72 \ {A}], and 
similarly for S'.) 


Case (a): OFT{S) < f ln(g). It holds that 


Pr[A{S) G 72] < Pr[A(5) ^ A] 


r /4\ 4, f Ak 

< Pr [Lap (-j > ( — 

<6< Pr[A(.S') G 72] + <5. 


Case (b): OPT(5) > | In(^). Let G{S) and G{S') be the sets used in step 2 in the execution 
S and on S' respectively. We will show that the following two facts hold: 

Fact 1 : For every / G G{S) \ G{S'), it holds that Pr[A(S') = /] < p 

Fact 2 : For every possible output / G G{S)nG{S'), it holds that Pr[A(5) = /] < e^-Pr[A(5') = /]. 

We first show that the two facts imply that the lemma holds for Case (b). Let B = G{S)\G{S'), 
and note that as q is of /c-bounded growth, |77| < k. Using the above two facts, for every set of 
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outputs i? C we have 


Pr[A(5) eR] = 


< 

< 


Fr[A{S)GR\B]+ ^ Pr[^(5) =/] 

f&RnB 

e" •PrWf^O G ii\S] + |i?nS|f 

k 

e" • Pr[Gl(5') G i?] + (5. 


To prove Fact 1, let / G G{S)\G{S'). That is, q{S, /) > 1 and q{S\ f) = 0. As q has sensitivity 
at most 1, it must be that q{S, f) = 1. As there exists / G 5 with q{S, /) > | In(^), we have that 


exp(f•1) _ /e\ ^ 

-exp(|.fln(g))“"^PUJ 4fc’ 

which is at most 5/k for e < 2. 

To prove Fact 2, let / G G{S) n G(5') be a possible output of A{S). We use the following 
Fact 3, proved below. 


Pr[A(5) = /] < Pr 


The exponential 
mechanism chooses / 


Facts: Y1 exp(|g(5', h)) < ^ exp(|g(5, h)). 

h£G{S') h£G{S) 

Using Fact 3, for every possible output / G G{S) n G{S') we have that 

Pr[A(5) = /] 

Pr[A{S>) = f] 


|^Pr[A(5) / T] • 


exp(fg(/, 5)) 

E/ieG(S)exp(fg(/i,5)) 


(pv[A{S^) / T] • 


Pr[A(5)/T] exp(|g(/,5)) EfegG(y) exp(|g(h, SQ) , 

Pr[A(5')/T] exp(|<?(/,5')) EfceG( 5 ) 5’)) 


exp(|g(/, 5*0) \ 

Eh(^G{S’)^MiQih,S^)) J 

— — € 

. e4 • e 2 = e . 


We now prove Fact 3. Let X = X]ftgG'( 5 ) exp(|g(5,/i)). Since there exists a solution / s.t. 
q{Sj) > |ln(g), we have T > exp(| • |ln(0)) > f. 

Now, recall that q is of fc-bounded growth, so |G(5'') \ G(5)| < k, and every h G {G{S') \ G{S)) 
satishes q{S', h) = 1. Hence, 


Y1 exp (^|g(5',/i)) 

h£G(S') 


< /c-exp(-) + 


exp(-g(5',/i) 


/ieG(S')nG(5) 


< /c • exp (-j+exp 

< k ■ exp ( 7 ) + exp ' ^ 


Y1 exp 
feeG(5')nG(S) 

Yj exp(^|g(5,/i 
h£G{S) 


= k ■ • T < 


where the last inequality follows from the fact that X > 4/c/e. This concludes the proof of Fact 3, 
and completes the proof of the lemma. □ 
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The utility analysis for the choosing mechanism is rather straightforward: 

Proof of Lemma 3.7. Recall that the mechanism defines OPT(S) as OPT(S') +Lap(|). Note that 
the mechanism succeeds whenever OPT(S') > | In(^). This happens provided the Lap random 
variable is at most | which happens with probability at least (1 — /?). □ 

Proof of Lemma 3.8. Note that if OPT(iS') < ^ln(^^), then every solution is a good output, 
and the mechanism cannot fail. Assume, therefore, that there exists a solution / s.t. q{f, S) > 
^ ln(^^), and recall that the mechanism defines OPT(5) as OPT(S') +Lap(|). As in the proof of 

Lemma 3.7, with probability at least 1 — /3/2, we have OPT(S') > | In • Assuming this event 
occurs, we will show that with probability at least 1 — (3/2, the exponential mechanism chooses a 
solution / s.t. q{S, f) > opt(5') - ^ In(^). 

By the growth-boundedness of q, and as S is of size m, there are at most km possible solutions 
/ with q{S,f) > 0. That is, |G(5)| < km. By the properties of the Exponential Mechanism, we 
obtain a solution as desired with probability at least 


— km ■ exp 





By a union bound, we get that the choosing mechanism outputs a good solution with probability 
at least (1 — /3). □ 


B Interior Point Fingerprinting Codes 

Fingerprinting codes were introduced by Boneh and Shaw [BS98] to address the problem of wa¬ 
termarking digital content. Suppose a content distributor wishes to distribute a piece of digital 
content to n legitimate users in such a way that any pirated copy of that content can be traced back 
to any user who helped in producing the copy. A fingerprinting code is a scheme for assigning each 
n users a codeword that can be hidden in their copy of the content, and then be uniquely traced 
back to the identity of that user. Informally, a finger printing code is fully collusion-resistant if 
when an arbitrary coalition T of users combine their codewords to produce a new pirate codeword 
the pirate codeword can still be successfully traced to a member of T, provided the pirate codeword 
satisfies a certain marking assumption. Traditionally, this marking assumption requires that if all 
users in T see the same bit b at index j of their codewords, then index j of their combined codeword 
must also be b. 

Recent work has shown how to use fingerprinting codes to obtain lower bounds in differential 
privacy [BUV14, DTTZ14, BST14]. Roughly speaking, these works show how any algorithm with 
nontrivial accuracy for a given task can be used to create a pirate algorithm that satisfies the 
marking assumption for a fingerprinting code. The security of the fingerprinting code means that 
the output of this algorithm can be traced back to one of its inputs. This implies that the algorithm 
is not differentially private. 

We show how our lower bound for privately solving the interior point problem can also be 
proved by the construction of an object we call an interior point fingerprinting code. The difference 
between this object and a traditional fingerprinting code lies in the marking assumption. Thinking 
of our codewords as being from an ordered domain X, our marking assumption is that the codeword 
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produced by a set of T users must be an interior point of their codewords. The full definition of 
the code is as follows. 

Definition B.l. For a totally ordered domain X, an interior point fingerprinting code over X 
consists of a pair of randomized algorithms (Gen, Trace) with the following syntax. 

• Gen„ samples a codebook C = (xi, ..., Xn) G X^ 

• Tracen(x) takes as input a “codeword” x & X and outputs either a user i £ [n] or a failure 
symbol T. 

The algorithms Gen and Trace are allowed to share a common state (e.g. their random coin tosses). 

The adversary to a hngerprinting code consists of a subset T C [n] of users and a pirate 
algorithm A : — )• X. The algorithm A is given C\t, i.e. the codewords Xi for i £ T, and its 

output X •(—R A{C\t) is said to be “feasible” if x G [minigy niaxjgr Xj]. The security guarantee 
of a fingerprinting code is that for all coalitions T C [n] and all pirate algorithms if x = A{C\t), 
then we have 

1. Gompleteness: Pr[Trace(x) = T A x feasible] < 7 , where 7 £ [0,1] is the completeness error. 

2. Soundness: Pr[Trace(x) G [n] \ T] < ^, where ^ G [0,1] is the soundness error. 

The probabilities in both cases are taken over the coins of Gen, Trace, and A. 

Remark B.2. We note that an interior point fingerprinting code could also he interpreted as an 
ordinary fingerprinting code (using the traditional marking assumption) with codewords of length 
\X\ of the form 000011111. As an example for using such a code, consider a vendor interested 
in fingerprinting movies. Using an interior point fingerprinting code, the vendor could produce 
fingerprinted copies by simply splicing two versions of the movie. 

We now argue as in [BUV14] that the existence of an interior point fingerprinting code yields a 
lower bound for privately solving the interior point problem. 

Lemma B.3. Let e < 1, 5 < l/(12n), 7 < 1/2 and ^ < l/(33n). If there is an interior point 
fingerprinting code on domain X for n users with completeness error 7 and soundness error 
then there is no {£,5)-differentially private algorithm that, with probability at least 2/3, solves the 
interior point problem on X for databases of size n — 1. 

Proof. Suppose for the sake of contradiction that there were a differentially private A for solving 
the interior point problem on X^~^. Let T = [n — 1], and let x = A{C\t) for a codebook C •(—r Gen. 

1 — 7 < Pr[Trace(x) 7 ^ T V x not feasible] < Pr[Trace(x) / T] + - . 

O 

Therefore, there exists some i* £ [n] such that 

Pr|TVaceW = i-l>i.(^-7)>T. 

Now consider the coalition T’ obtained by replacing user i* with user n. Let x' = A{C\t'), again 
for a random codebook C •(—r Gen. Since A is differentially private, 

Pr[Trace(x') = i*] > e~^ ■ (Pr[Trace(x) = T] — (5) > 

33n 

contradicting the soundness of the interior point hngerprinting code. □ 
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We now show how to construct an interior point fingerprinting code, using similar ideas as in 
the proof of Lemma 3.3. For n users, the codewords lie in a domain with size an exponential tower 
in re, allowing us to recover the log* |X| lower bound for interior point queries. 

Lemma B.4. For every re G N and ^ > 0 there is an interior point fingerprinting code for re users 
with completeness 7 = 0 and soundness on a domain Xn of size |X„| < tower 

Proof. Let b{n) = 2re^/^, and define the function S recursively by S'(l) = 1 and 5(re + l) = 

By induction on re, we will construct codes for re users over a domain of size S{n) with perfect 
completeness and soundness at most First note that there is a code with perfect 

completeness and perfect soundness for re = 1 user over a domain of size 5(1) = 1. Suppose we 
have defined the behavior of (Gen„, Trace^) for re users. Then we define 

• Gen„_|_i samples C = {x'l ,..., x'^f) ■<— r Gen„ and Xn+i -^r [5(re + 1)]. For each i = 1, ..., re, 

let Xi be a base-6(re) number (written where x^*^^ is the most significant 

digit) that agrees with x„+i in the x( most-significant digits, and has random entries from 
[6(re)] at every index thereafter. The output codebook is C = (xi,... ,Xn+i). 

• Trace„+i(x) retrieves the codebook C from its shared state with Gen„+i. Let M be the 
maximum number of digits to which any Xj (for i = 1 ,... ,re) agrees with Xn+i. If x agrees 
with Xn+i on more than M digits, accuse user re-|-l. Otherwise, let x' be the number of indices 
on which x agrees with x„+i, and run Tracen(x') with respect to codebook C = (x'^,..., x'^). 

We reduce the security of this scheme to that of (Gen„, Trace„). To check completeness, let T C 
[re + 1 ] be a pirate coalition and let ^ be a pirate algorithm. Consider the pirate algorithm A' for 
codes on re users that, given a set of codewords C'\t' where T' = T \ {re -|- 1}, simulates Gen„+i to 
produce a set of codewords C\t and outputs the number x' of indices on which x = A{C\t) agrees 
with Xn+l- 

If X is feasible for C\t and x^+^ ^n+ii ^Fen x' is feasible for C"|r'- Therefore, 

Pr[Tracen+i(x) = T A x feasible for C\t\ = Pr[x^^^^ ^n+i ^ Tracen(x') = T A x feasible for C\t\ 

< Pr[Tracen(x') = T A x' feasible for C'\t'] = 0, 


by induction, proving perfect completeness. 

To prove soundness, let M' = maxx(. Then 


Pr[Tracen+i(x) G [re -|- 1] \ T] < Pr[Trace„+i(x) = re -|- 1 A (re -|- 1) 

< Pr[x^'+^ = A (re -k 1) ^ T] 


< 


1 1 ^ 1 


b{n) 


^ Hj) ^ Hj) ^ 


^T]+ Pr[Tracen+i(x) G [re] \ T] 
+ Pr[Trace,i(x') G [n] \ T] 


□ 


Combining Lemmas B.3 and B.4 yields Theorem 1.8. 
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C Another Reduction from Releasing Thresholds to the Interior 
Point Problem 

We give a somewhat different reduction showing that solving the interior point problem enables 
us to a-accurately release thresholds with a polylog(l/a)/a blowup in sample complexity. It gives 
qualitatively the same parameters as Algorithm Thresh used to prove Theorem 4.6, but we believe 
the ideas used for this reduction may be useful in the design of other differentially private algorithms. 

This reduction computes approximate (a/3)-quantiles of its input, which can then be used to 
release thresholds with a-accuracy. To do so, it uses the strategy of [DNPRIO] of using a complete 
binary tree to generate a sequence of A: = 3 /q; noise values. The tree has k leaves and depth log A:, 
and at each node in the tree we sample a Laplace random variable. The noise value corresponding 
to a leaf is the sum of the samples along the path from that leaf to the root. 

We take the sorted input database and divide it into equal-size blocks around the k (a:/3)- 
quantiles, and perturb the boundaries of the blocks by the k noise values. Solving the interior point 
problem on these buckets then gives approximate (Q;/3)-quantiles. Moreover, the noisy bucketing 
step ensures that the final algorithm is differentially private. 

We formally describe this algorithm as Thresh 2 below. Let R be an (e, 5)-differentially private 
mechanism for solving the interior point problem on X that succeeds with probability at least 
1 — a/3/6 on databases of size m. In the algorithm below, let P{i) denote the set of prefixes of the 
binary representation of i (including the empty prefix). 


Algorithm 6 Thresh 2 {D) 

Input: Database D = (xi,..., Xn) £ X"^ 

1. Sort D in nondecreasing order 

2 . Let A: = 3/a be a power of 2 

3. For each s G {0,1}^ with 0 < £ < log A;, sample Vg ~ Lap((log A;)/2e) 

4. For each i = 1,..., A:, let r/j = Yls&P{i) 

5. Let To = anjQ, Ti = an/2 + 771 ,..., Tk _2 = an/Q + a{k — 2)n/3 -|- %_!, Tfc_i = n — an /6 

6 . Divide D into blocks Di,..., Dk_i, where Di = , • • ■, (note Di may be empty) 

7. Release R{Di),... ,R{Dm), interpreted as approximate (a/3)-quantiles. 


We will show that this algorithm satisfies (3e, (1 -|- e^)5)-differential privacy, and is able to 
release approximate k{= 3/a)-quantiles with (a/3,/3)-accuracy, and hence (a,/3)-accurate answers 
to threshold queries, as long as 


n > max 


6m 991og^'^(l/a) 






Privacy Let D = (xi,..., Xn) where xi < X 2 < • • • < x„, and let D' = (xi,..., x',..., x^). 
Assume without loss of generality that x' > Xj+i, and suppose 

Xl < • • • < Xi-l < Xj+l < ■ ■ ■ < Xj < x'i < Xj+l < • • • < Xn- 

Consider vectors of noise values = (z/i, z^ 2 ) ■ • • j ^m)- Then there is a bijection between noise vectors 
V and noise vectors v' such that D partitioned according to v and D' partitioned according to v' 
differ on at most 2 blocks (cf. [DNPRIO]). Moreover, this bijection changes at most 21ogm values 
Us by at most 1. Thus under this mapping, noise vector u' is sampled with probability at most 
times the probability u is sampled. We get that for any set S, 

Pr[(M(Z?i),..., M{Dn,)) G 5] < 6^(6^^ Pr[(M(D;),..., M{D'J) G S]) + (1 + 

= M{D'J) G A] + (1 + e^)5. 

Utility We can produce (a/Sj-accurate estimates of every quantile as long as 

1. Every noise value has magnitude at most an/3 

2. Every execution of R succeeds 

By the analysis of Lemma 4.7 in [DNPRIO], with probability at least 1 — /3/2, every noise value 
r/j is bounded by 11 log^'^(l/a)/e < an/12. This suffices to achieve item 1. Moreover, conditioned 
on the noise values being so bounded, each \Di\> an/Q > m, so each execution of R individually 
succeeds with probability 1 — a^j^. Hence they all succeed simultaneously with probability at least 
1 — /3/2, giving item 2. 
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